WP What is Skewing? | Data Poisoning & Falsification | Imperva

Skewing

9.5k views
Attack Types

What is Data Skewing

In a skewing attack, attackers want to falsify (or skew) data, causing an organization to make the wrong decision in the attacker’s favor. There are two common variants of skewing attacks:

  • Machine learning data poisoning attacks – attackers modify the training data used by a machine learning algorithm, causing it to make a wrong decision.
  • Web analytics skewing – attackers modify analytics data from platforms like Google Analytics or Adobe Analytics, by performing a large number of automated queries using bots. The objective is to make it appear that web visitors perform certain actions more often than they really do.

Data Poisoning Attacks

Many organizations use machine learning algorithms to make impactful business decisions. There are also many security systems that use machine learning analysis to determine if an event or an artifact is malicious or not. Data poisoning attacks create fake data points, which the algorithm learns from, and eventually start skewing their decisions.

For example, there were several large-scale attempts to perform data poisoning on Google’s Gmail spam filter. Attackers sent millions of emails specifically designed to throw off the classifier and change its definition of a spam email. This allowed attackers to send malicious emails without being detected.

Data poisoning is potentially a very dangerous threat vector, because it can be used to throw off any security system based on AI. For example, many organizations use User and Event Behavioral Analytics (UEBA) systems to analyze security events and determine which of them are abnormal or suspicious. Data poisoning could fool these systems into thinking that a malicious action is in fact innocuous.

Web Analytics Skewing

A web analytics skewing attack usually follows these steps:

  1. Attackers use bots to perform automated HTTP requests, in order to drive up the number of visitors on certain pages. Most commonly, these are pages with transactional significance, such as an eCommerce product page.
  2. The web analytics system registers the high numbers of clicks, and the website owners conclude there is a lot of interest in this item.
  3. In some cases, the skewing bot may also attempt to perform conversion actions, such as filling out forms or purchasing items. This requires a more complex bot platform, similar to that used by scalping bots.
  4. Falsified analytics data might cause the website owner to make a business decision, such as featuring the product more prominently on its website, or adding it to advertising campaigns.
  5. The business decision benefits the attackers—for example, because they are affiliates of the product featured on the targeted page.

What Are the Consequences of Skewing

Data is used to make important business decisions such as classification of security events, success or failure of website redesigns, promotions, and even product pricing. If this data is wrong, the choices made based on the data will similarly be wrong, and potentially damaging for business owners.

Examples of wrong business decisions that can be driven by skewing:

  • Wrongly identifying a malicious action as legitimate – for example mis-classifying a spam email or a repeated login attempt
  • Choosing the wrong design in an A/B test – this can lead to major financial losses, for example in large eCommerce organizations
  • Making an incorrect automated decision, such as assigning the wrong credit rating to an individual
  • Lowering pricing of Pay Per Click advertising for large advertisers – for example by wrongly determining the quality score of an ad
  • Over-compensating an affiliate or partner on the basis of clicks to a product page or conversion actions

Symptoms of a Skewing Attempt

Watch for the following anomalies in your website traffic or application usage, and investigate them to see if skewing may be going on:

  • Abnormal traffic spikes
  • Unusual growth in specific user categories
  • Unusually high number of pages per session or time per session
  • Unusually high bounce rate
  • Unusual user behavior within an application
  • Unusual usage of a product or website feature that has security or financial impact
Web analytics chart with traffic spikes due to skewing attacks

Web analytics chart with traffic spikes due to skewing attacks

DIY Skewing Attacks Prevention

Use the following best practices to help prevent skewing on a website:

  • Block outdated browsers or user agents—while advanced attackers may use modern browsers and user agents in their HTTP headers, many “script kiddies” use bots based on outdated browsers. You can block these outdated browser versions completely, or apply a strong CAPTCHA, without the risk of interrupting many actual users.
  • Block known bad hosts and proxies—obtain a list of known hosts and proxy networks used for malicious purposes. Disallowing access from these types of sources may discourage attackers from attempting skewing against your site, API, and mobile apps.
 Keep in mind attackers can use more advanced anonymization techniques, such as residential proxies.
  • Shield access points vulnerable to bots—think of different ways bots can connect to your systems over the Internet, beyond your website. Protect exposed APIs, mobile apps and any other public facing endpoint. When you encounter a bot and block it, share the data between all endpoints.
  • Evaluate traffic sources—periodically look at analytics or model training data, drill down into the data and look for segments with unusual characeristics. If you find one, investigate further to identify data generated by bots.
  • Investigate spikes in usage—when usage of your website or application suddenly spikes, drill down to see what functionality was affected. If you can attribute the entire spike to a specific traffic source, group of users or specific functionality, that is a strong sign of a skewing attack.

Take these measures to block skewing attacks once you have identified them:

  • Filter malicious sources in your web analytics
  • Filter malicious IPs in web analytics
  • Analyze firewall logs, identify bad bot traffic related to the unusual analytics data, and configure your firewall to block it

Advanced Defense Techniques

The following techniques provide comprehensive protection against bad bots in general, and skewing bots in particular.

Device fingerprinting

When attackers use bots to perform skewing attacks, they need to operate at scale from a single device. This means changing browsers, clearing caches, or using IP obfuscation methods. You can detect attackers despite these methods, however, with device fingerprinting. Device fingerprinting enables you to identify device parameters and browsers that remain the same throughout an attack. This allows you to identify when a single user is connecting multiple times and stop them from accessing your system.

Reputation analysis

Frequently, bots originate from the same or similar sources or follow predictable behavior patterns. When a bot is identified, these characteristics can be collected and applied to future protections. For example, you can use databases containing details and patterns of known bots to automatically block bots from your site. These databases can help you more quickly identify bots and block access.

Browser validation

One tactic that bots may use is to mimic a specific browser and cycle through user profiles to avoid detection. To detect this, you can use browser validation to confirm that users and browsers are how they appear. For example, this method can verify that calls are made as expected or that JavaScript agents are uniform.

Machine learning behavior analysis

Real website visitors tend to behave in predictable ways. While bots may also behave in predictable patterns, these patterns often don’t match those of real users. To identify these differences, you can use behavior analysis to compare traffic and actions against your live user baseline behaviors. When traffic doesn’t match, you can investigate to confirm whether a user is a person or a bot.

Progressive challenges

To quickly and efficiently detect bots, you can issue suspected users progressive challenges. These challenges require potential bots to perform tasks that are difficult or impossible for automated users. For example, seeing if users can accept cookies, execute JavaScript, or complete CAPTCHAs. Using these challenges, you can eliminate many bots early on and avoid forcing disruptions on your actual users.

See how Advanced Bot Protection can help you with Skewing.

Imperva Bot Management

Imperva’s Advanced Bot Protection solution can protect against skewing and other data poisoning attacks by using all the advanced security measures covered above, letting you identify bad bots with minimal disruption to real user traffic.

In addition, Imperva covers the additional security measures that complement a defensive bot strategy. It offers multi-factor authentication and API security—ensuring only desired traffic can access your API endpoint, and blocks exploits of vulnerabilities.

Beyond bot protection, Imperva provides multi-layered protection to make sure websites and applications are available, easily accessible and safe, including:

  • DDoS Protection—maintain uptime in all situations. Prevent any type of DDoS attack, of any size, from preventing access to your website and network infrastructure.
  • CDN—enhance website performance and reduce bandwidth costs with a CDN designed for developers. Cache static resources at the edge while accelerating APIs and dynamic websites.
  • WAF—cloud-based solution permits legitimate traffic and prevents bad traffic, safeguarding applications at the edge. Gateway WAF keeps applications and APIs inside your network safe.
  • Account takeover protection—uses an intent-based detection process to identify and defends against attempts to take over users’ accounts for malicious purposes.
  • RASP—keep your applications safe from within against known and zero‑day attacks. Fast and accurate protection with no signature or learning mode.