In IIS, banning IP addresses from accessing a website is fairly easy. I rarely do this, however, because I prefer to use a combination of an IP address and a user agent string to identify bad bots that are likely scraping my content or attempting to harvest email addresses.
I try to avoid blocking an IP address at all costs. IP addresses can be forged and changed, so I prefer to rely on an IP address and user agent string combination to identify the culprit that I want to exile. This approach is not fool proof, but I find it be much more reliable.
Scalability is also an issue. The use of an ISAPI filter to process requests for every website on the server or a single file sure makes life easy. The Microsoft IIS configuration console is a mouse-click nightmare on a server with a couple hundred websites.
I use Ionic’s Isapi Rewrite Filter to change the URL structure of websites to be more search engine friendly. This filter uses the PCRE library, and the use of regular expressions is always a huge plus. The rewriting rules are maintained inside one .ini file, so tweaks and updates are a breeze.
Here is an Ionic’s rewrite rule that will let you block access to every site on your server based upon an IP address and user agent string match. In this particular case, I am blocking an email address harvester with IP 24.132.226.94 and user agent Java/1.6.0-oem.
RewriteCond %{REMOTE_ADDR} 24\.132\.226\.94
RewriteCond %{HTTP_USER_AGENT} Java/1\.6\.0-oem
RewriteRule ^/(.*)$ /$1 [F]
The two conditions on this match use server variables to match the user’s IP address and user agent string to an expression match. The final line is the rewrite rule that matches any file on any website. The [F] flag tells the Ionic’s filter to return an appropriate HTTP status code of 403 Forbidden.
Regular expressions provide the capability to block a range of IP addresses and partial user agent matches. If i wanted to match on any version of this Java-based robot, I could expand the second condition to something like this:
RewriteCond %{HTTP_USER_AGENT} Java/\d.\S*
Similarily, wildcard matches on IP addresses can be used to block ranges of IPs instead of a single address.
The Microsoft vs *NIX server debate will never die. I use both everyday, and I find that the biggest advantage that the open source server environment has over Microsoft is the interface. Using the Ionic’s ISAPI filter allows me to control the URL structure and blacklist for all of my websites easily and efficiently.
I see this method of blocking IPs or blacklisting bots based on IP address and user agent as a great way to simulate an .htaccess approach to the same problem on a Microsoft server.
UPDATE:
As of May 2009, I am using these rules to block these Java bots. I know earlier in this post I favored an IP address and user-agent combination, but my IP address list grew to more than 100 entries before I abandoned that method. There are no useful Java bots. Useful bots have useful names.
RewriteCond %{HTTP_USER_AGENT} Java.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Java.*
RewriteRule ^/(.*)$ /$1 [F]