How to: Stop Spam on your MediaWiki website

This past week I deleted a few hundred wiki pages and user accounts from the MediaWiki installation our company uses to track software features and technical issues. Here is how you can stop your public MediaWiki website from becoming the victim of relentless spam bots.

  1. Limit exposure

    My wiki was operating smoothly for about a year and a half before any spammer had found it. A publicly accessible and poorly secured website is always a sitting duck. Once the site was indexed by search engines, finding it became a lot easier. A simple search query like Powered by MediaWiki will list thousands of targets for wiki spammers. I am not sure how our wiki was found by search engine robots, but I certainly know when. Right before the onslaught of spam.

    Limit exposure to software security holes

    The very first change I made to my wiki was a complete update of the MediaWiki software running on the website. The longer software source code is available in the wild, the more likely that someone has found a security hole or method to exploit the scripts to allow outside manipulation.

    The backup and upgrade procedures are very intimidating to the casual user. I downloaded a copy of all the website’s files, uploaded the files that make up the latest release, uploaded my old LocalSettings.php so it was not changed with the update, and then ran the installation script again. When you re-run the MediaWiki installer, it will recognize the existing database tables and update them accordingly.

    Prevent Search Engines from indexing your MediaWiki

    I am now using the REP to prevent robots from crawling the entire site. The robots.txt file in the root of the website directory looks like this:

    User-agent: *
    Disallow: /

    Our wiki is for our use within the office only, so we could care less if anyone else finds or reads the website’s contents via a search engine. If removing your wiki from search engines is not a viable course of action, you can still stop spammers by following the rest of these instructions.

  2. Trip the bots

    The most violent spammers that attack MediaWiki websites are automated scripts. These scripts assume that the MediaWiki is unmodified and vulnerable to its content creation routines. A simple CAPTCHA will trip the spam bots. Spammers don’t have time to figure out why they can’t pollute a certain MediaWiki website–they move on to easier targets. I installed the ConfirmEdit extension and configured it to require a simple arithmetic CAPTCHA before saving any edit.

    Restrict user account creation and anonymous editing

    Here are two lines of code I added to LocalSettings.php to prevent new user registrations and anonymous (IP address only) edits:

    # Prevent new user registrations except by sysops
    $wgWhitelistAccount = array ( "user" => 0, "sysop" => 1, "developer" => 1 );
    # Restrict anonymous editing
    $wgGroupPermissions['*']['edit'] = false;
  3. Learn how to police new content

    Within 30 days of the initial attack, my wiki had hundreds of new pages and user accounts. More garbage was being added to the wiki so quickly, that the Recent Changes page was not a sufficient monitor for me to see what was being added to my website. Here is a valuable page that outputs a list of every page on your wiki:

    I also installed an extension called Nuke that facilitates quick mass deletion of any user’s contributions.

    Larger or highly active wikis will naturally be harder to maintain as spam-free websites. I am very happy that I got to experience these spam bots only 18 months after launching the wiki. Using the AllPages script was only slightly painful because the the total number of good pages on my wiki at the time was in the low hundreds. If the spam bots find another way to plague my website, I will surely write a second chapter to this guide.