How to Block Java user-agents

A variety of user-agents that begin with “Java” are likely visiting your website. Visits providing this type of user-agent are programs created in Java by developers who did not choose to change the default user-agent string value. Here is a list of the Java user-agents I have encountered:


Java/1.4.1_04
Java/1.5.0_02
Java/1.5.0_06
Java/1.5.0_14
Java/1.6.0_02
Java/1.6.0_03
Java/1.6.0_04
Java/1.6.0_07
Java/1.6.0_11
Java/1.6.0_12
Java/1.6.0-oem

I will maintain this list simply for kicks. There is no need to collect an exhaustive list of these user-agent strings in order to block them. As I have mentioned before, I prefer to ban non-human visitors based on a combination of an IP address and a user-agent string.

URL rewrite rules

Here are some URL rewriting conditions and rules that will match a list of IP addresses and any user-agent that begins with “Java” and deliver a 403 Forbidden response for any HTTP request to your server:


RewriteCond %{HTTP_USER_AGENT} Java.*
RewriteRule ^/(.*)$ /$1 [F]

The condition matches any user-agent string that begins with “Java” no matter what comes later. The rewrite rule returns any location that was requested with a 403 Forbidden response code. There will be no change made to the URL and no document delivered.

IIS7 URL Rewrite web.config


<rule name="no-java-bots" stopProcessing="true">
    <match url="(.*)" />
    <conditions>
	<add input="{HTTP_USER_AGENT}" pattern="^Java/.*" />
    </conditions>
    <action type="AbortRequest" />
</rule>

Why block Java bots?

Bots with a well-defined purpose will typically identify themselves with a unique name. These Java user-agents are either not interested in identifying their purpose or not ready to publish their name and take ownership of the crawling activities. Both cases are a waste of bandwidth. Test your new application on someone else’s website. Play with your shady crawler on someone else’s website. Come back when you are willing to identify yourself.