SwirlyMyself

2006-06-11T20:14:15+00:00

Deleting spam trackbacks

The Trackback spam on my blog started to get really annoying, even with serendipity's “invert selection” and “delete” buttons. So I came up with a simple perl script, that looks at all unapproved trackpack pages and deletes those who don't link to my blog. It's not perfect, as there might be correct cases where someone trackbacks my post without actually linking to it, but that's ok for me. Also, the spammers will probably fix this “bug” of theirs, too (and some already have, as someone told us during a lightning talk at the GPN5), but at least for a little while this will save me some work. (See the complete post for the code)

#!/usr/bin/perl

# Trackback-Spam Deleter © 2006 Joachim Breitner
# 
# Automatically deletes unapproved trackbacks that do not include
# a link to our blog.

# DB-Configuration, should be clear
my $dbName    = 'serendipity';
my $dbPrefix  = 'serendipity_';
my $dbHost    = 'localhost';
my $dbUser    = 'serendipity';
my $dbPass    = 'shushshsh';

# String that has to appear in the linked pages (e.g. blog url)
my $require   = qr'joachim-breitner.de/blog';

# From here on, it's code

use DBI;
use LWP::Simple;
use strict;
use warnings;

print "Trackback-Deleter by Joachim Breitner\n";

print "Connecting to the database...\n";
my $dbh = DBI->connect(
        "DBI:mysql:database=$dbName;host=$dbHost",
        $dbUser,
        $dbPass,
        {RaiseError=>1, PrintError=>0}
);

my $urls = $dbh->selectall_arrayref(
        "SELECT url
                FROM ${dbPrefix}comments
                wHERE type = 'trackback' AND status = 'pending'
                GROUP BY url",
) or die $dbh->error();

foreach my $url ( @$urls ) {
        $url = $url->[0];
        print "Checking URL: $url\n";
        my $content = get("$url");
        if ($content =~ $require) {
                print "Page links to us, please approve manually\n";
        } else {
                print "Page does not link to us, deleting trackback...\n";
                $dbh->do("DELETE FROM ${dbPrefix}comments WHERE url = ?",
                          undef, $url)
        }
        print "\n"
}

Comments

The Spam Blocker plugin available from the buil-in plugin manager has an option for checking the trackback links for a link to the blog. While that doesn't help you with the ones you already have, it should prevent a bunch of them in the future.
#1 Jamuraa (Homepage) am 2006-06-12T02:20:24+00:00
I'm not running 1.0 yet, so I can't use that functionality. I'm still waiting for the 1.0 release.

Also, don't some blogs trackback before their entry is actually available? In that case, a delayed check like this might be handy. Not sure if that is actually the case, though.
#2 Joachim Breitner (Homepage) am 2006-06-12T12:48:05+00:00
I upgraded from 0.8.x to a 1.0 beta and with the proper config of that spamblocker plugin I get no spam trackbacks now (down from 100+ a day)

However, it took a pretty nasty merge to get that though.
#3 Penny Leach (Homepage) am 2006-06-12T10:02:42+00:00
Thanks for the script, another cool thing which is really working great is to use mod_security of apache to prevent trackback spam. A really really short description can be found here:
http://blog.pebcak.de/archives/198-How-to-effectively-stop-trackback-spam.html
#4 nion (Homepage) am 2006-06-17T20:56:34+00:00

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.