Not all users visiting your site are human. Many of the requests made for your site and its content come from bots and other forms of automation. This rise in automated–often malicious–traffic leads to costly and unmanageable strain on your security staff and resources.
But before determining how to block bots from a website, you must first ask yourself a few key questions about your website and your business needs. Use the information in this page to not only find how to block bots from a website, but more importantly, find how to block bots from your website.
Humans vs. Bots: How Can I Tell?
On its surface, a visit from a human and a bot may appear nearly identical. Bots can appear as normal users, with an IP address, browser and header data, and other seemingly identifiable information. But dig a bit deeper by collecting and reviewing in-depth analytics and other request data and you’ll be able to find the holes in the bots’ disguises.
This research stage is time-consuming and complex, and must be dealt with before deciding how to block bots from a website. Bad Bots vs. Good Bots: What’s the Difference?
Now that you’ve separated human traffic from bot traffic, you can dig a bit deeper to see which bots are good and which are bad. Good bots include search engine crawlers (Google, Bingbot, Yahoo Slurp, Baidu, and more) and social media crawlers (Facebook, LinkedIn, Twitter, and Google+).
Generally, you want to allow these good bots access to your site, since they help humans find and access your site. Bad bots include any bots that are engineered for malicious use.
These bots attempt scraping, brute force attacks, competitive data mining causing brownouts, account hijacking, and more.
Knowing the difference between the bots visiting your site lets you take action on bad bots and allow access to good bots.
What Are the Bad Bots Targeting?
Bots are tailored to target very specific elements of a website, but can affect more than just stolen content, spammed forms, or account logins. The Open Web Application Security Project (OWASP) published the Automated Threats Handbook for Web Applications, which profiles the Top 20 automated threats and categorizes each threat as one of four types:
- Account Credentials – Includes account aggregation, account creation, credential cracking, and credential stuffing.
- Payment Cardholder Data – Includes carding, card cracking, and cashing out.
- Vulnerability Identification – Includes footprinting, vulnerability scanning, and fingerprinting.
- Other – The catch-all category. Includes, ad fraud, CAPTCHA bypass, denial of service, expediting, scalping, scraping, skewing, sniping, spamming, and token cracking.
So answering the question of how to block bots from a website depends on which threats the site is experiencing.
How Do I Block Bad Bots from My Site?
The most basic means of blocking bad bots from your site involves blacklisting individual IP address or entire IP ranges. This approach is not only time consuming and labor intensive, but it is also a very small band-aid on a very large issue. Automated bots can cycle through hundreds or thousands of IP addresses at a time, meaning they’ll associate themselves with another IP moments after getting blocked.
You could also look at individual requests to check their attributes, such as correct user agent formatting. But even still, spoofing or emulating browsers is common practice and can easily get around cursory checks.
Another option is to establish challenges when you receive a curious or potentially threatening request. For example, below are a few graduated levels of threat responses:
- Monitor – Keep an eye on a bad bot’s activity while it moves through your site. Learn its habits and use its behavior to strengthen your protective measures against it when the time is right. Or, apply this learned knowledge to other bad bots visiting your site.
- CAPTCHA – This is the first actual layer of defense, as it presents a simple CAPTCHA test to a seemingly threatening visitor. CAPTCHA tests quickly and easily weed out simple automated bots that cannot read and supply a correct answer to the test, while allowing human users access upon completing the test.
- Block – Block pages offer an extra level of defense on top of a basic CAPTCHA test. You can block a visitor’s access to your site and have them submit a brief request form to your support or security team. Once reviewed and approved, the team allows the visitor’s access. Otherwise, if the request is not fully submitted or if the request is deemed malicious, the team entirely drops the request for good.
- Drop – The harshest threat response is dropping access entirely. This option does not provide any sort of recourse, be it a CAPTCHA test or an unblock request form. The visitor must move on to target another site.
Ideally, each of the options above should be as automated as possible. Doing so ensures bad bots are stopped as quickly as possible, while good, human users will only be slightly or momentarily impeded while visiting your site.
So while you could build, manage, and maintain your own bot defense campaign from scratch when trying to figure out how to block bots from a website, there are highly effective, pre-built solutions out there. Hire an external company or firm to design and implement a protective suite relatively quickly and make sure the bot defense industry’s best and brightest are on the job.