WP Our Analysis of 1,019 Phishing Kits | Imperva

Our Analysis of 1,019 Phishing Kits

Our Analysis of 1,019 Phishing Kits

In recent years phishing activity has grown rapidly, with thousands of phishing sites popping for a virtual moment that last weeks, days or even hours, before becoming ineffective—either getting blacklisted by security providers, or brought down by internet providers and authorities, or (in most cases) both. In order to keep up with this dynamic, a significant portion of the phishing activity relies on phishing “kits”—software packages that allow quick and easy deployment of a new phishing site.
We set forth to learn about phishers’ methods and motivations, particularly about phishing kit packages that contain complete phishing web sites in an easy-to-deploy format.
Here are the insights we gained from our research, which we review in detail in this post:

  • Free phishing kits often contain hidden exfiltration mechanisms that send the entered information to third parties, which are probably the kits’ authors
  • Half of the packages belongs to large families of kits. While a third belong to three large families of kits. This means that phishing kits come from a restricted number of sources.

Industrialization of Phishing

Being one of the most effective ways to gain a foothold within the enterprise, many network and data breaches start with phishing. According to Verizon’s 2017 Data Breach Investigations Report, 81% of hacking-related breaches leveraged either stolen and/or weak passwords.
Like many other cyberthreats, the phishing domain has evolved in recent years from a world where a few know-all attackers build, execute and manage entire phishing campaigns, into a role-based ecosystem where different people with different skill sets fulfill different roles. This industrialization allows modern cybercriminals to stop worrying about the technical stuff like building fake sites and collecting stolen credentials, and instead focus on their portion of the process and work in large scale, both of which result in increased revenue.
In this research we explore one of the significant enablers of this evolution, which is the channel between phishing technology providers and the campaigners – known as do-it-yourself (DIY) phishing kits – which dramatically reduce the cost and time required to set up a phishing campaign. DIY phishing kits include the files necessary to create a copy of target web sites, steal valuable information, and simplify the configuration of the phishing web site. Furthermore, DIY kits are constantly evolving and various features are being introduced for different management purposes, like extending the usability of their pages and servers.
Along with improvement of industrial-grade platforms and infrastructure, the business model of phishing is also evolving. Underground markets are full of phishing kits at all levels and cost, some even distributed at no charge, usually revealing one of the oldest rules in the book – you get what you pay for. Here we found the only free cheese is in the mousetrap. Many of the free phishing kits have hidden back doors in them which allow their provider to track activity of the phishing campaigner, and get copies of the stolen information that was obtained, thus reducing their level of effort and risk and increasing their return on investment (ROI) by harvesting the work of inexperienced criminals who deploy their kits.
This makes the phishing world a live ecosystem with various players, obeying basic rules of economy. It allows technology providers to maximize the revenue on their development, phishing campaigners to focus on campaigning, and “dishonest” hack-the-hacker DIY kit providers to cheat naive campaigners (which are of course dishonest to their victims). In our research we focused on families of kits that seem to be related to each other, and in many cases even derived from the same source. Unsuprisingly, we found that DIY phishing kits obey the Pareto principle, with majority of the kits attributed to a small number of sources.

Phishing Attack Flow

The following flow demonstrates a standard phishing attack using a phishing kit.

  • First, the attacker buys a compromised server (or uses a hosting service) and uploads the phishing kit to the server
  • Then, the attacker uses a spam service to send a burst of phishing emails to potential victims
  • The victims fall into the phishing trap, visit the phishing pages and enter their credentials
  • The phishing kit processes the credentials and sends them to an external email account
  • Finally, the attacker accesses this email account and harvests the new credentials

phishing attack flow - figure 1
Figure 1: Phishing attack flow

The Research from 20,000 Feet

Our research had four phases (Figure 2):

  • Finding sources for phishing sites and their kits
  • Obtaining phishing kits
  • Retrieval and normalization of features from the phishing kits
  • Statistical analysis and clustering of the extracted features

research phases - figure 2
Figure 2: Research phases

Phishing Sites Sources

We used two different sources to locate and obtain phishing kits:

  • TechHelpList.com, which publishes a collection of URLs for phishing kits gathered throughout 2016
  • Open Phish feed, which offers URLs of zero day phishing sites

Obtaining Phishing Kits

From the first source, TechHelpList.com, we downloaded long-life phishing kits. From the second source, Open Phish, we obtained phishing kits by crawling live phishing sites. We developed a tool which gets a list of phishing URLs and retrieves the phishing kit from the backend of the phishing server.
This was possible because phishers’ common practice is to deploy a phishing site by uploading a kit to a web server. After deploying a kit, attackers often forget to remove it, and when a server is vulnerable to directory traversal it is possible to locate and download the kit.

Extracting Features

Before comparison analysis we performed several preprocessing steps. We extracted features that characterize phishing kits, cleaned up redundant white space and normalized feature values.

Analyzing Phishing Kits

We created a statistical analysis of extracted features to understand the importance and incidence of each one. Then we performed hierarchical clustering on the extracted features (more details in the “Research Method” section below).


From both sources we collected 1,019 phishing kits in total. We obtained 428 phishing kits from TechHelpList.com pastes, which are 9.6% of the checked URLs. From the Open Phish feed we obtained 591 kits, which are 7% of the checked URLs. From the above, we can see a bias in coverage to inexperienced attackers who leave their kits on the compromised servers.

Phishing Kit Anatomy

Phishing kits contain two types of files: 1) resource files which are needed to display a copy of the targeted web site and 2) processing scripts which are used to save the stolen information and send it to attackers.
Based on analysis of our collection of phishing kits, we noticed that the majority of phishing kits contain all the resources required to copy the targeted web site, including images, HTML pages and CSS files. This reduces the number of requests the kit issues to the target site, and hence the chances of being detected if the original site analyzes incoming requests. However, we observed several kits with phishing pages containing links to the original targeted sites.
google docs phishing kit - figure 3
Figure 3: Google Docs phishing kit
Figure 3 is an example of a common Google Docs phishing kit, which is about 15 percent of our collection. Kits’ resource files contains Google figures and CSS password validation files, while PHP files are processing code which store and send the stolen information to the attacker.
processing code of google docs phishing kit - figure 4-1
processing code of google docs phishing kit - figure 4-2
Figure 4: Processing code of a Google Docs phishing kit
Figure 4 shows processing code of the Google Docs kit. The first part of the code checks which email provider was selected by the victim. Then it retrieves the victim’s details such as browser, IP address and geolocation.  If Gmail is the email provider of a victim, the victim will be redirected to the next page ‘verification.php’ which will lure them to re-enter their recovery email or phone number. This is required by Google for authentication from an unrecognized device.
The second part of the code is building the phishing result email message. In our example, processing code is signed by signature ‘CANADA’. The phishing results message contains: email provider, email and password, IP address and geolocation of the victim. The resulting message is sent to the attacker’s email address which we assume is the buyer of the kit.
The last part of the code redirects a victim to a legitimate landing page of Google Drive, to avoid the victim’s suspicions.
The marked features on Figure 4 are those we used in our statistical analysis and clustering.

Phishing Kit Capabilities

Exfiltration Mechanisms of Phishing Kits

One of the main functions of a phishing kit is to automatically send stolen information to the attackers. The vast majority of kits (98%) used email to exfiltrate stolen data to attackers. Only two percent of kits stored collected information in a file on the compromised server.
From our automated analysis of the 1,019 phishing kits, we extracted 843 unique email addresses. They were registered at 53 different domains: Gmail.com is the most frequently used (79%), followed by Yandex.com (5%), Yahoo.com (4%), Hotmail and Outlook (3%).

But What Happens When You Buy from a Thief?

About 25 percent of the kits contained implicit recipients which receive emails with the phishing results as well as the kit buyers who were intended to receive it. We assume that the hidden addresses belong to the kits’ authors, which are actually stealing from the inexperienced phishers who deploy these kits. This is likely the main reason that phishing kits are distributed for free in underground circles.
We saw multiple techniques used to hide the author’s email address, but the most popular were address obfuscation (Figure 5) and repeated mail statements (Figure 6) that leverage the fact that PHP is case-sensitive for variable names. Thus, the apparently repeated mail statements have actually two different recipients.
address obfuscation technique - figure 5
Figure 5: Address obfuscation technique to hide phishing kit author’s email address
mail statements hide phishing author - figure 6
Figure 6: Repeated mail statements technique to hide phishing kit author’s email address
We also observed many kits contain a comment: “Don’t need to change anything here”, at the top of one of the scripts. This comment aims to deter kit’s operators from examining the script that contains the hidden field with the email address of the kit author.

Extending the Lifespan

With so many prying eyes of security vendors, researchers and index services, phishing campaign operators are trying to find ways to extend the life expectancy of their pages and servers.

Block Unwanted Access

One of the common methods we found in the kits (in 17% of them) was a mechanism for blocking unwanted visitors, thus creating the façade that the site is already down, and therefore extending its life expectancy and increasing the owner’s ROI.
The following are common techniques to hide phishing kits:

  • .htaccess files — contain a list of blocked IP addresses related to search engines and security companies bots
  • txt files —used to prevent bots from accessing specific remote directories
  • PHP scripts — dynamically check if the remote IP address is allowed to access the phishing pages

extend lifespan of phishing kit - figure 7
Figure 7: Block unwanted access techniques

Blacklist Evasion

13% of phishing kits contain blacklist evasion techniques, which redirect each new victim to a newly-generated random location. It’s basically randomizing the URL per visitor using the following steps:

  • Creates a random phishing kit subdirectory
  • Copies the content of the entire kit inside it
  • Redirects the visitor to the newly generated random location

The following PHP code presents an example of such behavior:
blacklist evasion - figure 8
Figure 8: Blacklist evasion techniques
This approach allows phishers to hide the real link to the phishing kit from being blacklisted, and thus extend the lifespan of phishing pages and servers.

Research Method

We developed an automated tool that extracts features from phishing kits.  The following features were extracted from metadata of the phishing kits:

  • Name of the kit
  • A list of file names contained in a kit. To clear noise from data we excluded images and CSS files.
  • Size of the phishing kit

The features below were extracted using regular expressions from the processing code which builds the phishing results email:

  • Author’s signature – which was extracted from processing code. For example:

authors signature - for example

  • Recipient – which we assume is a buyer of the kit. For example:

recipient - for example

  • Sender – which was extracted from the From field of processing code. For example:

sender - for example

  • Subject – of the results email. For example:

subject - for example
Afterwards, we performed a statistical analysis on the extracted features. The features we chose were:

  • The list of files contained in the phishing kit
  • Author’s signature from the processing code
  • Subject and sender of phishing results email

Statistical Analysis

Authors’ Signature

We started our statistical analysis with the authors’ signatures feature. We extracted 271 authors’ signatures from our collection of kits. About a third of the kits didn’t contain any signature, while half of the kits had a non-unique signature. This could imply that at a least half of the kits on the underground market are created by a restricted number of authors.
author signature distribution - figure 9
Figure 9: Phishing kit authors’ signature distribution
The most popular author was “NoBODY” which appeared in about 7% of the kits in our collection. The second most popular author was “me’” and the third “FUD TOOL DOT COM”, which appeared in 2% of kits.
The figures below show three examples of processing code which contains popular authors’ signatures:
popular kit author signature - figure 10a
popular kit author signature - figure 10b
popular kit author signature - figure 10b
Figure 10: Examples of  popular authors’ signature
The below table summarizes the top 10 authors’ signatures found in the phishing kits:
top 10 author signatures - figure 11
Figure 11: Top 10 author signatures in phishing kits
We searched for one of the most popular signatures, “FUD TOOL DOT COM” and found a few interesting sites. This author publishes different tools, hosting services and phishing pages for free and for profit (see Figures 12a and 12b).
spam tools facebook ad - figure 12a
Figure 12a: “Chase new page 2015” advertising by FUD TOOL DOT COM found on Facebook
spam tools landing page - figure 12b
Figure 12b: Ad landing page, “Fresh Spam Tools” page by FUD TOOL DOT COM

A Shopping Cart of Kits’ Buyers

The next feature we analyzed was email recipient (kit buyers). From all analyzed kits we extracted 716 buyers.
distribution of phishing kit buyers - figure 13a
Figure 13: Distribution of phishing kit buyers according to the quantity of kits purchased
We noticed that about a quarter (24%) of attackers used several phishing kits (which represented 56% of the examined kits), most likely to maximize their potential profit.
distribution of purchased kits - figure 14
Figure 14: Distribution of kits according to the quantity of kits purchased

Subject Feature

Most of the kits have non-unique subjects. It strengthens the claim that phishing kits come from a restricted number of sources.
email subject distribution - figure 15
Figure 15: Phishing kits – email subject distribution

Clustering Method

We performed clustering on extracted features in three steps.  First, we chose features that characterized the phishing kits and were applicable for comparison between kits.

  • The list of files contained in the phishing kit
  • Author’s signature from the processing file
  • Subject of phishing results email
  • Sender of phishing results email

Afterwards, we performed clustering on the first feature, list of files contained in the phishing kits, while distance between each of two kits was defined as:
distance between two kits
Finally, we performed clustering on the results of the previous step and other features that were mentioned above.  In our clustering, every cluster of kits had at least one of the features in common. To streamline the analysis, we used a native distance function that set the distance to 0 if two kits had at least one common feature, and 1 otherwise.

Clustering Results

We identified 230 clusters, with more than half of the kits grouped in nineteen big clusters of size greater or equal to 10. Furthermore, 72% of kits belonged to medium-sized clusters of size greater or equal to five.
The following graph demonstrates the clustering of deployment kits. The blue dots are phishing kits, while the purple triangles are cluster identifiers. The largest cluster contains 153 kits, the second largest cluster contains about 78 kits, and the third 66 kits.
clustering of phishing kits - figure 16
Figure 16: Clustering of phishing kits
Based on clustering results we can conclude that about half of the kits were created by a small group of experienced phishers.  With almost a third of the kits belonging to three large clusters, this indicates the phishing kits came from a restricted number of sources.
We also noticed that more than a third of the kits that belonged to the large clusters contained implicit recipients which are probably the kits’ authors. This might explain the phishers’ motivation to distribute free phishing kits in the underground community.

Role-based Ecosystem

Phishing activity has rapidly changed in recent years. It emerged from a small-scale practice into an industrialized automated operation involving multiple actors with well-defined roles. The underground circles shift from a reputation-based society into a profit-driven economy, resembling legitimate economic ecosystems, but also one in which experts resort to treachery against newcomers. This shows that criminals not only target gullible users, but are also taking advantage of the inexperienced (or competing) criminals.
phishing role based ecosystem - figure 17
Figure 17: Phishing role-based ecosystem
The veteran players try to minimize their effort and operational costs and maximize their return on investment by harvesting the work of newcomers.  It takes place in three steps:
•         Experts advertise and distribute phishing kits to newcomers
•         Newcomers deploy them and steal valuable information from victims
•         Experts steal from newcomers using hidden exfiltration mechanisms (hidden recipients emails)

Summary and Conclusion

Do-it-yourself phishing kits are effective tools available to phishers and one of the significant factors in reducing the price and time to set up a phishing scam.
In this research, we analyzed a large collection of phishing kits obtained from a variety of sources and discussed the kits’ technical characteristics.
In summary, here are the insights we gained on the phishing kits’ main capabilities, their origin and their effect on the phishing market:

  • Phishing campaign operators are trying to find ways to extend the life expectancy of their pages and servers and increase their ROI. 17% of kits contained a mechanism for blocking unwanted visitors, thus creating the façade that the site is already down. Furthermore, 13% of phishing kits contain blacklist evasion techniques, which redirect each new victim to a newly-generated random location.
  • The “business model” of phishing has emerged. Initially, criminals started to create phishing kits and offer them for sale. As various features have been introduced to make phishing sites more efficient and to extend the life expectancy of their pages, phishing kits have become actively promoted and distributed at no charge on underground sites. However, the “free cheese” is a trap. Free phishing kits often hide implicit recipients (probably the original kits’ authors), who receive new phished information in addition to the kit buyer. Therefore free phishing kits respond to rational economical motivations—attackers can decrease their effort and risk, and increase their ROI by harvesting the work of inexperienced criminals who deploy their kits. At least a quarter of phishing kits contained hidden recipients that transmitted the stolen information to third parties (likely the original kits’ authors).
  • About half of the kits were created by a small group of experienced phishers, while almost third of the kits belonged to three large clusters. This shows that phishing kits come from a restricted number of sources.