WP What is HTML Smuggling | Stealthy Bypassing Technique | Imperva

HTML Smuggling

3.5k views
Attack Types

What Is HTML Smuggling?

HTML smuggling is an innovative attack technique, which abuses HTML5 and JavaScript features to inject or extract data across network boundaries. The data is cloaked in a legitimate file format, which bypasses security checks. This technique is typically used to deliver malware or exfiltrate sensitive data, often undetected by traditional security mechanisms.

HTML smuggling is a client-side attack, which means that malicious activities occur within the user’s browser. It utilizes features such as the Blob and File objects provided by HTML5. These APIs allow web applications to create files and store data directly on the user’s system, making HTML smuggling a significant concern.

How HTML Smuggling Works

HTML smuggling typically starts when attackers send a phishing email or lure victims to a malicious web page. When the user opens an email attachment or downloads a file from the website, the HTML smuggling process can begin.

Data Encapsulation Within Legitimate Files

HTML smuggling relies heavily on data encapsulation – the process of disguising data within legitimate files. This is achieved using HTML5 and JavaScript features that allow web applications to create and manipulate files directly within the browser.

For example, a malicious actor might use an HTML5 Blob object to encapsulate a harmful script within an innocuous-looking PDF file. The disguised file can then be delivered to the user’s system, bypassing traditional security checks.

This encapsulation technique is very versatile. With the right scripting, a threat actor can encapsulate virtually any type of data within any type of file. This makes HTML smuggling a highly adaptable and effective tool for delivering a variety of payloads, from malware to sensitive data.

Bypassing Network Security Devices

HTML smuggling can be an effective way to bypass network security devices like firewalls and intrusion detection systems (IDS). These devices typically inspect network traffic for known malicious patterns or signatures. However, they often overlook the traffic created by HTML smuggling because it appears to be legitimate.

In an HTML smuggling attack, the malicious data is not delivered directly over the network. Instead, it is built on the client-side within the user’s browser using legitimate features provided by HTML5 and JavaScript. This means the malicious data never crosses the network boundary in its raw form, making it difficult for traditional network security devices to detect.

Client-Side File Creation and Data Extraction

Using HTML5 and JavaScript, a web application can generate files directly within the user’s browser. These files can be anything from simple text documents to complex executable files. The generated files can then be saved to the user’s system or transferred across the network.

Once the files are created, the data encapsulated within them can be extracted and used for various purposes. For example, an attacker could use HTML smuggling to deliver a malicious script embedded within a seemingly harmless file. Once the file is on the user’s system, the script can be extracted and executed, leading to remote code execution (RCE).

Types of HTML Smuggling Attacks

Malware Delivery and Execution

One of the most common uses of HTML smuggling is to deliver and execute malware. By encapsulating malicious scripts within legitimate files, threat actors can bypass traditional security measures and deliver their payloads directly to the user’s system. Once there, the malware can be extracted and executed, potentially compromising the system.

Exfiltration of Sensitive Data While Bypassing DLP

HTML smuggling can also be used to exfiltrate sensitive data. This is achieved by encapsulating the data within a legitimate file and then transferring the file across the network. The encapsulated data can then be extracted and utilized by the threat actor.

Traditional DLP solutions are designed to prevent the loss of sensitive data by monitoring network traffic for known data loss patterns. However, they are often ineffective against HTML smuggling. This is because the data being smuggled is encapsulated within a legitimate file and built on the client-side, making it difficult for DLP solutions to detect.

Bypassing File Type Restrictions on Uploads or Downloads

In some cases, web applications impose restrictions on the types of files that can be uploaded or downloaded. HTML smuggling can be used to bypass these restrictions. By encapsulating the desired data within an allowed file type, a user or threat actor can successfully upload or download the data without triggering any alerts.

Key Techniques in HTML Smuggling

Using Blob Objects and JavaScript for File Generation

HTML smuggling’s foundation lies in the interplay between blob objects and JavaScript. Blob objects allow developers to handle binary data directly. JavaScript, on the other hand, is used to create these blob objects, which, in turn, form the smuggled content.

JavaScript’s versatility as a scripting language comes to the fore in HTML smuggling. With JavaScript, you can generate blob objects containing any content, from simple text files to complex executable. These blob objects are then downloaded onto the client’s machine, bypassing the server entirely, which is the essence of HTML smuggling.

Data Conversion Methods

Data conversion is another essential aspect of HTML smuggling, specifically the conversion of binary data into a format that can be handled by JavaScript. The most common method is Base64 encoding, which transforms binary data into a string of ASCII characters. This encoded string can then be decoded back into binary data by the client’s browser.

Encoding is not limited to Base64, though. Other methods like binary, hexadecimal, or even custom encoding techniques can be used, depending on the specific requirements of the smuggling operation.

Utilizing File and URL API for File Manipulation

The File API allows developers to read and write files directly on the client’s machine, while the URL API enables the creation of object URLs that link to the smuggled content.

The URL API, in particular, plays a critical role in HTML smuggling. It allows the creation of a URL that links directly to the blob object, effectively bypassing the server. This URL can then be used to trigger the download of the smuggled content onto the client’s machine.

Client-Side Decryption

HTML smuggling can also be used to deliver encrypted content to the client’s browser. This is achieved through client-side decryption, where the smuggled content is encrypted before being delivered, and then decrypted by the client’s browser. This is another way to avoid detection of malicious payloads.

HTML Smuggling Mitigation and Defense Strategies

Content Disarm and Reconstruction (CDR) for Web Traffic

CDR is a technique that works by stripping all active content from a file, leaving only the safe, static content behind. This effectively neutralizes any potential threats contained in the smuggled content, rendering it harmless to the client’s machine.

CDR can be applied to all web traffic, making it a strong defense against HTML smuggling. However, it does have its limitations, such as the inability to handle encrypted content.

Browser Security Settings and Configurations

Ensuring that browser security settings and configurations are set correctly is another important defense against HTML smuggling. This includes disabling unnecessary features and plugins that could be exploited for HTML smuggling, as well as keeping browsers up-to-date with the latest security patches and updates.

Security Awareness Training for End-Users

Many HTML smuggling attacks succeed because the end-user is unaware of the risks and unwittingly downloads the smuggled content. Users need to be especially vigilant to prevent HTML smuggling, because the downloaded content typically appears as harmless files.

By providing end-users with regular training on the dangers of HTML smuggling and how to spot it, you can significantly reduce the risk of successful attacks. A well-informed end-user is often the best defense against cyber threats.

Deploying Advanced Threat Protection Solutions

Possibly the most effective way to prevent HTML smuggling is to deploy advanced threat protection solutions that recognize HTML smuggling tactics. These solutions can monitor web traffic for signs of HTML smuggling and raise alerts when suspicious activity is detected. They can also block or quarantine suspected smuggled content, thereby preventing it from reaching the client’s machine.

HTML Smuggling Prevention with Imperva

Imperva’s Web Application Firewall can prevent HTML smuggling and many other application-layer attacks with world-class analysis of web traffic to your applications.

Beyond application protection, Imperva provides comprehensive protection for applications, APIs, and microservices:

Runtime Application Self-Protection (RASP) – Real-time attack detection and prevention from your application runtime environment goes wherever your applications go. Stop external attacks and injections and reduce your vulnerability backlog.

API Security – Automated API protection ensures your API endpoints are protected as they are published, shielding your applications from exploitation.

Advanced Bot Protection – Prevent business logic attacks from all access points – websites, mobile apps and APIs. Gain seamless visibility and control over bot traffic to stop online fraud through account takeover or competitive price scraping.

DDoS Protection – Block attack traffic at the edge to ensure business continuity with guaranteed uptime and no performance impact. Secure your on premises or cloud-based assets – whether you’re hosted in AWS, Microsoft Azure, or Google Public Cloud.

Attack Analytics – Ensures complete visibility with machine learning and domain expertise across the application security stack to reveal patterns in the noise and detect application attacks, enabling you to isolate and prevent attack campaigns.

Client-Side Protection – Gain visibility and control over third-party JavaScript code to reduce the risk of supply chain fraud, prevent data breaches, and client-side attacks.