WP The Anatomy of a Scalping Bot: NSB Goes Undercover & How it Avoids Detection | Imperva

The Anatomy of a Scalping Bot: NSB Goes Undercover & How it Avoids Detection

The Anatomy of a Scalping Bot: NSB Goes Undercover & How it Avoids Detection

In the first blog post, we introduced you to the Nike Shoe Bot (NSB), one of the most dangerous scalping bots around. We outlined its purpose, its behavior, and described how we recovered its source code.

In this blog post, we will take a closer look at the bot’s source code, and determine what information we can get from it. We’ll discover that simple heuristics are not enough to tackle such a bot.

Investigation procedure

Even deobfuscated, a bot like NSB is composed of more than a 100,000 lines of code. Analyzing such a program can’t be linear; anyone can easily get lost. There are three principles I followed when analyzing the bot’s source code:

  • Create a list of questions 
  • Allocate enough time to answer each question
  • Iterate the process when the investigation results in more questions

I’ll present my findings in a Q&A format in this blog post, adhering to the same process I used in my original analysis.

What is the main behavioral flow of the bot?

As you will see in this analysis, depending on the target, the bot doesn’t behave exactly the same.

If the target requires it, the bot will use advanced capabilities to move through the website, like precise human mouse motion simulation. Otherwise, the bot will perform simple actions to accomplish its goals. The following flow corresponds to those simple actions and only shows what we deemed most important for this story.

Behavioral flow of the bot

Behavioral flow of the bot

  • Pick proxy from the list. During the definition of the task, the bot user inserts a list of proxies. It will have the following form: `ip:port:user:password`. The bot randomly selects one of those proxies during a task.
  • Visit the site homepage.
  • Set application cookies. At this stage, the bot sets basic client-side cookies, like language-preference.
  • Login: The bot connects to the account provided by the bot user.
  • Visit main cart page.
  • Clear the cart if not empty. This is done in order to clean out previous unfinished tasks. This procedure is useful to make sure the scalping task will only purchase an item from the current task. At this stage, the bot waits randomly between 1.5 – 3 seconds between each item to remove it from the cart.
  • Add item to the cart. At this stage, the bot verifies if the product is not out of stock.
  • Confirm checkout. The bot confirms its intention to purchase the item.
  • Get sensors. Sensors are indicators left by anti-bot security vendors to give information about the session to help determine if it is a bot session. The purpose of this step is to obtain sensors unlikely to raise the suspicion of the anti-bot security solution that is protecting the site. The bot can obtain these by querying the following URL that belongs to the bot: https://n4s[.]xyz/sensor

Bot sending request to n4s[.]xyz to retrieve anti-bot security solution sensors and then bypass protection

Bot sending request to n4s[.]xyz to retrieve anti-bot security solution sensors and then bypass protection

Example of head of sensor_data retrieved from the API:

Example of head of sensor_data

Then, the sensor_data is forwarded to the target website. This way, the bot attempts to bypass the anti-bot security.

  • Fulfill payment. The bot sends the HTTP request for the payment. In case of credit card error or proxy issue, the bot retries to complete the operation. After 3 failures, the proxy is rotated.

Which evasion techniques does the bot use during its tasks?

The bot has two ways to send network requests to targeted websites. The evasion techniques the bot uses differ:

  • Electron network requests
  • Puppeteer 

Let’s analyze both approaches.

Electron network requests

The bot can use the electron.net module to access pages via the inner class `NRequest`. In addition to sending the network traffic in order to avoid detection, this class of bot does the following:

Reorder HTTP header keys according to a predefined list.

The order of HTTP header fields can be used to fingerprint a device; therefore, enabling detection of traffic coming from bots. NSB bot developers understand this well. What they’ll do instead is attribute a priority number, for each header field and each target, starting with 1 and increasing when the priority decreases. This number determines the place of the HTTP header field in the HTTP header.

In the picture below, the accept header is on top. This field has the highest priority.

HTTP Headers

Electron network requests

Overwrite TLS fingerprint

TLS is a protocol that ensures secure web communications between a client and a server. It relies on cryptographic algorithms to protect the exchanged data, and is used to initiate the TLS Handshake process.

The first step in the handshake process entails the client sending a Client Hello message to the server, which includes details of the client’s supported encryption methods (also known as cipher suites) and the current version of TLS being used. This message begins the process of establishing a secure session with the server.

A cipher suite is a set of algorithms used to encrypt the traffic. The client sends the list of cipher suites it supports for the server to choose which one it wants to use for the connection.

As this is a relatively unique combination, it has been used for fingerprinting and bot detection. NSB bot developers are aware of this technique. Therefore, they randomize the ciphersuite of their bot(s) to prevent it from being fingerprinted, detected, and banned from the network.

randomize the ciphersuite

The constant e is an array 83 possible cipher suite. The list sent to the target site randomly generated using the function `shuffleArrayWithRandomLength` on this list of 83 items. This random generation is weak and could help us detect the bot.

Prevent network analysis

In order to make the analysis of the bot harder, the developers added a section of code to analyze the certificate chain of the network. If the chain contains a self-signed certificate or certificates that are known to be from network analysis software (e.g. Fiddler, Charles Proxy), the bot terminates.

Algorithm responsible for the anti-network analysis

Algorithm responsible for the anti-network analysis

Puppeteer network requests

Puppeteer is a widely-used browser automation framework that allows for programmatic control of a browser like Chrome or Chromium. In our scenario, it is used in combination with the Extra-Plugin-Stealth extension.

The bot can use the Puppeteer library to access pages via the function `BrowserCheckout`. The following graph shows the behavioral flow of the bot when this method is used.

One behavioral flow of a “Browser Checkout” function

One behavioral flow of a “Browser Checkout” function

Initialization

At the beginning of a scalping task (for example against Zalando), the initial step is to initiate Puppeteer.

This extension aims at hiding Puppeteer traces to attempt to make the browser it controls not classified as a bot but as a human. This plugin will be detailed later.

Configuration

The bot then configures the properties of the browser page.

During this stage, the bot tries to append Gmail cookies into the browser from a `CapWindow`.

A Capwindow is a separate browser instance where the user connects to their Gmail account and performs human-like actions like watching YouTube videos. Then, the Gmail cookies are transferred to the task browser and the bot is less likely to receive complicated CAPTCH challenges.

Navigation

To properly navigate the site, the bot uses a BrowserHelper class. This class contains a large list of JavaScript expressions, routes, specific to a target.

Browser Helper file for Zalando target

Browser Helper file for Zalando target

For example, the JavaScript variable goToCheckoutButtonScript contains a JavaScript command to extract a button from Zalando website:

ZLBrowserHepler

The bot implements traffic interception during navigation in order to optimize the speed of the scalping tasks. In the case where many heavy resources are loaded by the site during the browsing experience, the bot will simply skip resources that are not needed to fulfill the purchase flow.

Algorithm responsible for filtering out potentially heavy resources and make the bot go faster

Algorithm responsible for filtering out potentially heavy resources and make the bot go faster

Puppeteer specific evasion techniques

As mentioned earlier, the bot uses Puppeteer-Extra-Plugin-Stealth to hide its activity. This plugin is a series of JavaScript files used to hide automation traces. For example, 

  • `navigator.webdriver` is turned to false
  • Indication of headless browser are masked
  • Indication of Puppeteer activities are cleared from error stack traces

More evasions

Check against Google reCaptcha

The bot requests the url https://recaptcha-demo.appspot[.]com in order to evaluate the ability of the bot Google account to be considered as human by reCaptcha. Then, the bot gives its user the ability to connect to YouTube and watch videos to increase this rate.

reCAPTCHA demo

Simulated interactions

The bot simulates human-like mouse motion during navigation. Depending on the target side, it doesn’t use the same algorithm. Here, a few examples:

Simple simulated human mouse motion

The following algorithm is a simple, simulated mouse motion. 

The principle is the following: It triggers 3x the mouse move events at a random position, around the targetm, and defines a delay of between 1 and 3 ms between each motion event.

mouse emulations

Then, to simulate a click event, the following algorithm triggers a mouseDown event followed by a mouseUp event, 5 ms after.

mouseUp event

Elaborated simulated human mouse motion

In other situations, the bot uses a more advanced human simulated mouse motion mechanism:

The bot uses the npm package Ghost-cursor to simulate Human mouse motion.

This algorithm is based on Bezier curves.

Mouse curve

Example of curve generated via ghost-cursor module

Conclusion

The analysis of NSB was an enriching experience as it shows a large panel of anti-detection features used by advanced bots. This bots revealed multiple capabilities to avoid analysis, detection by anti-bot security solutions like random proxy selection, randomization of HTTP header fields, or TLS fingerprint overwrite. 

Imperva Threat Research is constantly evaluating new bots and their operations to enhance the market-leading Imperva Advanced Bot Protection and to mitigate the most sophisticated automated threats, including all OWASP automated threats. It leverages superior technology to protect all potential access points, including websites, mobile applications, and APIs. It does so without affecting the experience of legitimate users.