Remote File Inclusion [RFI] is an attack exploiting the functionality in web applications which allows the inclusion of external source code without validating its content or origin.
An RFI payload is a link that points to a malicious file that an application will include in its code (example: url=[h]ttp://rfi.nessus.org/rfi.txt). Thereafter, the malicious code will be executed on the server with the privileges of the running application.
Successful RFI exploitation may lead to Remote Code Execution [RCE] or full control of the server.
Unlike SQL injection or Command injection where WAF detects a malicious payload sent to the server and mitigates it on the single request level regardless of whether or not the application is vulnerable, RFI requires a more complex solution as the malicious request doesn’t actually contain any malicious content, but only a URL to an external resource. If the vulnerable endpoint is known, WAF can block the attack based on an exploitation attempt. If it isn’t, an analysis of the file’s content is required.
To determine whether the external resource contains malicious content, we need to retrieve its content and inspect it. This functionality is an enhancement to the Imperva’s WAF real-time engine.
To block RFI based on its content, it’s necessary to have a service that downloads and inspects the file’s contents in order to determine whether it’s malicious or not.
RFI detector overview
To accomplish this task in Imperva cloud and on-prem WAFs we collect all suspected URLs in a database then apply an elimination algorithm based on previous scan results and well-known good domains to reduce the number of URLs that should be downloaded and scanned. We collect 240M URLs on a weekly basis.
Our in-house implementation of the RFI detector uses Python with a combination of asynchronous programming library [aiohttp] to download multiple concurrent files and scan them at a high rate.
The scanner can detect malicious content in many programming languages such as PHP, .NET, Python, and Java.
If malicious content is detected, we add the URL to a list of malicious URLs. Thereafter a global rule will block any request containing one of those malicious URLs.
Interestingly, almost all the malicious content we’ve discovered is written in the PHP programming language. Considered the most popular and widely-used programming language for web development, it’s the most vulnerable to RFI because remote inclusion is a built-in functionality in PHP language.
We developed an in-house malicious file scanner that uses different heuristics to distinguish between legitimate and malicious content.
The main challenge was to minimize false positive detection. In the first version, we used only regular expressions but due to a high false-positive rate, we had to rethink this approach.
Here the idea was to develop a boolean evaluation engine that allowed us to create more complex signatures with an increased focus on combinations of keywords, code patterns, regular expressions, and evasion techniques detection mechanisms. Using the engine we were able to drastically minimize the false-positive rate without compromising the accuracy of the detection.
Once we detect a URL that leads to malicious content, we know what the vulnerable parameter used as injection point is (the parameter name used to include the file), the attacker’s IP and the domain hosting the malicious file.
Collection and aggregation of this information are used to increase RFI protection:
- Injection point – monitor the vulnerable parameters to further detect RFI attempts regardless of the content injected.
- If the file content isn’t detected by the engine, the payload will be analyzed manually by a security researcher to update the detection engine.
- Attacker’s IP – we monitor and mark IPs that have a bad reputation.
- Domains – A list of domains hosting the malicious file for further research and domain reputation
Let’s take a glance at the RFI data produced by the scanner:
On the chart below, we can observe that on average of 1282 IPs (ranging between 908 and 2024) perform RFI attacks weekly, while around two percent of the IPs belong to known vulnerability scanning services.
Looking at the amount of RFI requests detected by the mechanism, you can see that more than four-fifths of the requests were generated by the vulnerability scanners (e.g. Nikto, Whitehat, Nessus), while only a fifth were real attacks.
The content of the files used by such scanners contained basic probing functionality to check whether the attack succeeded as well as any RFI vulnerable endpoints.
Analyzing the clients that performed the rest of the activity, we found that most of the attacks were carried out by automated tools, while 35 percent were done by unclassified bots. The most common language used to execute the attacks is Perl and GoLang.
It was interesting to note that independent attackers utilized vulnerability scanners, RFI-probing URL to probe whether the web application is vulnerable to RFI before injecting a more complex malicious payload.
Example of vulnerability scanners RFI-probing URLs:
Finally, the attacking file contained code that had wide functionality and mostly used different evasion techniques to hide and bypass detection mechanisms.
In the example below, the malicious script tried to download and run 64-bit malware. If it failed, it would then try to download a 32-bit version of the same malware.
We’re constantly maintaining the engine to handle the most complex cases in terms of detection and reduction of false-positive. As a next step, we want to explore YARA rules as an additional mechanism.
YARA rules are widely adapted across the cybersecurity community, providing a convenient way to contribute, and helping us to stay up-to-date.