Malicious encodings attacks are a technique used to bypass a server's security filter using various types of character encodings (URL, Unicode, etc.).
Application developers are increasingly aware of security problems and try to avoid them. Since most security risks arise as a result of user-manipulated input (e.g., Parameter Tampering, Directory Traversal), one solution is to verify or filter the received input. As a result, most modern applications have some sort of input filters.
Most security filters operate on the input received from the users, and attempt to detect malicious input. Some filters may operate on outgoing data (mainly used to avoid sending malicious code to users).
The most common technique attackers use to bypass these filters is encodings. The most common encoding format is the ASCII characters encoding, using 7-bit representation for each character. Additional encodings, however, are supported by different environments, and are often required when embedding free text in parsed protocols. The two major types of encoding used by attackers to bypass security filters are URL encoding and Unicode/UTF8 encoding.
Data used in Web applications is not restricted, and may be encoded using any character set or binary data. URL encoding is a technique for mapping 8-bit data to the subset of the US-ASCII character set allowed in a URL. Without proper validation, URL-encoded input can be used to disguise malicious code for use in a variety of attacks. URL encoding can be used by the attacker to pass parameters to the application, bypassing URL filtering in the Web server or intrusion detection systems. It can also fool the application, bypassing filtering mechanisms. URL encoding of a character is performed by taking the 8-bit hexadecimal value of a character and prefixing it with a "%". For example, the US-ASCII character set represents a space with decimal code 32, or hexadecimal 20. Thus its URL-encoded representation is %20. When the attacker sends an encoded URL, the Web server passes the request to the application (bypassing all the security filters) and the disguised malicious code is executed. This method can be used by attackers in a number of attacks such as Parameter Tampering, Directory Traversal, source code disclosure and Cross-Site Scripting.
Another encoding method that can be used to implement malicious encoding is the Unicode/UTF-8. Unicode is a method of referencing and storing characters with multiple bytes by providing a unique reference number for every character, regardless of the language or platform. It is designed to allow a Universal Character Set (UCS) to encompass most of the world's writing systems. Unfortunately, the extended referencing system is not completely compatible with many old (albeit common) protocols and applications, and this has led to the development of a few UCS transformation formats (UTF) with varying characteristics. One of the most commonly formats, UTF-8, has the characteristic of preserving the full US-ASCII range. UTF-8 has multiple character mappings of UCS, so the same character can have several representations. For example, The UTF-8 sequence for the "." (dot) character represented as 2E, C0 AE, E0 80 AE, F0 80 80 AE, F8 80 80 80 AE, or FC 80 80 80 80 AE.