Search blog for

, ,

The Challenges of DIY Botnet Detection – and How to Overcome Them

Botnets have been around for over two decades, and with the rise of the Internet of Things (IoT) they have spread further to devices no one imagined they would – printers, webcams, and even toasters and fridges.

Some botnets enlist infected devices to mine cryptocurrency or steal passwords from other devices. But others are, in fact, legions of bot-soldiers waiting for a command to attack a target server. Here at Imperva, we detect botnets and prevent them from harming our customers. Botnet detection isn’t an easy task. In this post I’ll attempt to describe the pitfalls in botnet detection.

Detecting a Botnet

So what’s a botnet? Simply put, it’s a cluster of bots – compromised computers and devices – that perform commands given by the botnet owner. Usually, the botnet owner will dedicate one compromised device as the Command and Control (CnC) server for communication with his bots. Thus, the best way to discover a botnet is by finding its CnC, but that’s usually not a simple task. Let me explain why.

How can we Detect a Botnet

The smoking gun that points to a botnet is its CNC. Obviously, here at Imperva we don’t protect CnCs or bots – we protect against attacks originating from them. We are successful enough that it’s very unlikely any bot or CnC will be able to operate behind our service. Practically speaking, our best option to detect botnets is to examine their attacks on sites we protect.

When looking at exploit attempts, there are a few possible indicators of a botnet. For example, if the same IPs attack the same sites at the same time using the same payloads and attack pattern, there’s a good chance they’re part of the same botnet. This is especially true if many IPs and sites are involved. One common example is a DDoS attempt by a botnet on a web service.

A botnet attempting to DDoS a few sites: as the owner of the sites, during the attack you’ll see a large group of IPs sending many requests to the login page and the shopping cart page.

Reasons for False Positives

Even though I might have made detecting botnets sound quite simple, it really isn’t. Some payloads are so widely used that it’s difficult to distinguish between a truly-concerted botnet attack and a random one-off attack. Attackers can change their IPs by using a VPN or a proxy, making it look like many attackers are involved. Some proxy services even allow a single user to utilize many different IPs.

Hacking Tools can be Deceiving

Hacking tools and vulnerability scanners are similar to botnets as well. These tools generate the same payloads and attack patterns, and many hackers use them, regardless of the color of their hat. While it is an unlikely scenario, if different players conduct a Penetration Test on the same site at the same time, it’ll look like a coordinated botnet attack.

How can we Differentiate?

There are many ways to identify clients, but in this case simply looking at the raw request will do the trick. Luckily for us, because vulnerability scanners are so popular, it is easy to find out if they’re to blame. Sometimes, the user agent header will reveal the name of the tool. In other cases, Googling the payload will lead you straight to the tool.

Bot(net) Or Not?

Grab ‘em by the Payload

To discover botnets, we decided to use two different approaches. The first approach uses a naive back-and-forth algorithm to find botnets.

Any website owner can analyze data from their weblogs and use this technique.

You might want to improve this algorithm, and you can do so in several ways. You can separate the request to parameters and then search for a popular parameter value. Try using Levenshtein Distance, or any other distance algorithms, to find similar payloads. For this research, we decided to simply separate requests into query strings and post bodies.

Any website owner can analyze data from their weblogs and use this technique.

The following charts plot the daily activity of IP addresses involved in an attack on our websites during a given timeframe. In red, you can see the percentage (left axis) of IPs that participated in an attack on any given day, which is calculated by taking the number of attacking IPs that day and dividing by the highest number of attacking IPs on ANY day during our time frame.

Similarly, the blue line represents the percentage (left axis) of attacked sites on any day, calculated by dividing the number of protected sites attacked that day by the highest number of protected sites attacked during this timeframe. The yellow bars represent the median (right axis) number of days all of the attacking IPs on that day have attacked overall during the studied timeframe. For instance, if 30 IPs attacked on one day and the median number shown is 10, that means 15 IPs have attacked more than 10 days, and 15 IPs have attacked fewer than 10 days.

Attack #1

A Backdoor Uploader. Nearly 1,000 IPs attempted to upload a backdoor to over 1,000 sites. The payload coming from the different IPs was exactly the same, but that’s not the best part. It appears that the payload is a variation of the infamous CKnife webshell. Combined with the low IP turnover rate (i.e. the same IPs are attacking half of the time, as shown by the high median yellow bars), chances are that this is a botnet.

Attack #2

Nearly 4,000 IPs used a payload meant to test for a SQL Injection vulnerability. A search for that payload revealed that the SQLI Dumper tool is behind the attack. Looking at other attacks performed by these IPs revealed attempts at Remote Code Execution (RCE), backdoor upload and other attacks that aren’t in the SQLI Dumper playbook. Also, while the number of attacking IPs grows – the median number of days attacked by the attacking IPs shrinks. Testing for correlation between them revealed a strong negative correlation (-0.84). Combining this data with the medium IP turnover rates (shown by the yellow bars) indicates that this attack is comprised of a few core bots and many temporary IPs. We tested this hypothesis and found that ~50 IPs were involved during the entire attack. This might mean that several different groups are using the same payload, and this is not a single botnet.

Attack #3

A tool that looks like a botnet, but it’s not. Let me explain why. Although nearly 2,000 IPs were involved, it’s easy to see that the median number of days they attacked is pretty low. This means that in most cases hackers used these IPs to attack for a few days, and then stopped using them completely. This pattern isn’t typical of botnets because botnet owners will usually reuse the IPs in their disposal. Googling the payload revealed that a popular hacking tool named AutoexploiterBot is behind this attack. Likely, multiple users used it to attack us which explains why it wasn’t the same attacking IPs.

The payload sent during the attack:

The base64 in the exploit than decodes to a mid-stage code, which decodes to a webshell with a visible link to the tool:

Bringing out the Big Guns

The second algorithm we used for botnet detection has a more sophisticated approach. We utilized our specialized Client Classification abilities to cluster clients that carried out many coordinated attacks.

Out of the hundreds of results we got, we focused on the most interesting ones:

Attack #4

Backdoor Uploader revisited. This is the same backdoor uploader we found using the first approach. This time we caught more of its core IPs as indicated by the low turnover rate (i.e. the high yellow median bars). It’s interesting we found this botnet using both approaches, even though they are inherently different.

Attack #5

Probably the most distinguishable of them all. This botnet has a handful of malicious Remote Code Execution (RCE) payloads. Each RCE embeds the same unique site address somewhere within the victim’s server. Furthermore, its IPs almost never change, as indicated by the very high yellow bars. To recap – we have the same few payloads, advertising the same site, coming from the same IPs. Thus strongly indicating this is a botnet.

Attack #6

A botnet blogpost isn’t complete without a Spambot. This one is aiming at the comment section of a web site, trying to add comments advertising a Chinese gambling site. What’s fascinating is that it allows us to glimpse multiple cycles of spam campaigns. In each cycle, a varying number of IPs attack for a short while and then stop. A possible explanation would be that this Spambot is for hire, and each cycle is a paid spam campaign.

***

Botnets can be a tricky thing to detect and mitigate, but even analyzing the simplest weblog entries can supply valuable insight, especially against continuous campaigns. All of the botnets we found can cause real damage to your site and customers. Some will take over your site and others will expose private information.

Once you find an IP that belongs to a botnet, you can block it and use it to discover more IPs that are part of the botnet. Some of the payloads we found in this research were a few years old, or new variants of old exploits. So digging into your log history might give you insight to protect your site the next time a botnet comes around.