WP Open-Source Intelligence (OSINT) | Techniques & Tools | Imperva

Open-Source Intelligence (OSINT)

App SecurityAttack ToolsEssentialsThreats

Open-Source Intelligence (OSINT) Meaning

Open Source Intelligence (OSINT) is a method of gathering information from public or other open sources, which can be used by security experts, national intelligence agencies, or cybercriminals. When used by cyber defenders, the goal is to discover publicly available information related to their organization that could be used by attackers, and take steps to prevent those future attacks.

OSINT leverages advanced technology to discover and analyze massive amounts of data, obtained by scanning public networks, from publicly available sources like social media networks, and from the deep web—content that is not crawled by search engines, but is still publicly accessible.

OSINT tools may be open source or proprietary: the distinction should be made between open source code and open source content. Even if the tool itself is not open source, as an OSINT tool, it provides access to openly available content, known as open source intelligence.

History of OSINT

The term OSINT was originally used by the military and intelligence community, to denote intelligence activities that gather strategically important, publicly available information on national security issues.

In the cold war era, espionage focused on obtaining information via human sources (HUMINT) or electronic signals (SIGINT), and in the 1980s OSINT gained prominence as an additional method of gathering intelligence.

With the advent of the Internet, social media, and digital services, open source intelligence grants access to numerous resources to gather intelligence about every aspect of an organization’s IT infrastructure and employees. Security organizations are realizing that they must collect this publicly available information, to stay one step ahead of attackers.

A CISO’s primary goal is to find information that could pose a risk to the organization. This allows CISOs to reduce risk before an attacker exploits a threat. OSINT should be used in combination with regular penetration testing, in which information discovered via OSINT is used to simulate a breach of organizational systems.

How Attackers and Defenders Use OSINT

There are three common uses of OSINT: by cybercriminals, by cyber defenders, and by those seeking to monitor and shape public opinion.

How Security Teams Use OSINT

For penetration testers and security teams, OSINT aims to reveal public information about internal assets and other information accessible outside the organization. Metadata accidentally published by your organization may contain sensitive information.

For example, useful information that can be revealed through OSINT includes open ports; unpatched software with known vulnerabilities; publicly available IT information such as device names, IP addresses and configurations; and other leaked information belonging to the organization.

Websites outside of your organization, especially social media, contain huge amounts of relevant information, especially information about employees. Vendors and partners may also be sharing specific details about an organization’s IT environment. When a company acquires other companies, their publicly available information becomes relevant as well.

How Threat Actors Use OSINT

A common use of OSINT by attackers is to retrieve personal and professional information about employees on social media. This can be used to craft spear-phishing campaigns, targeted at individuals who have privileged access to company resources.

LinkedIn is a great resource for this type of open source intelligence, because it reveals job titles and organizational structure. Other social networking sites are also highly valuable for attackers, because they disclose information such as dates of birth, names of family members and pets, all of which can be used in phishing and to guess passwords.

Another common tactic is to use cloud resources to scan public networks for unpatched assets, open ports, and misconfigured cloud datastores. If an attacker knows what they are looking for, they can also retrieve credentials and other leaked information from sites like GitHub. Developers who are not security conscious can embed passwords and encryption keys in their code, and attackers can identify these secrets through specialized searches.

Other Uses of OSINT

In addition to cybersecurity, OSINT is also frequently used by organizations or governments seeking to monitor and influence public opinion. OSINT can be used for marketing, political campaigns, and disaster management.

OSINT Gathering Techniques

Here are three methods commonly used to gain open intelligence data.

Passive Collection

This is the most commonly used way to gather OSINT intelligence. It involves scraping publicly available websites, retrieving data from open APIs such as the Twitter API, or pulling data from deep web information sources. The data is then parsed and organized for consumption.


This type of collection requires more expertise. It directs traffic to a target server to obtain information about the server. Scanner traffic must be similar to normal Internet traffic to avoid detection.

Active Collection

This type of information collection interacts directly with a system to gather information about it. Active collection systems use advanced technologies to access open ports, and scan servers or web applications for vulnerabilities.

This type of data collection can be detected by the target and reveals the reconnaissance process. It leaves a trail in the target’s firewall, Intrusion Detection System (IDS), or Intrusion Prevention System (IPS). Social engineering attacks on targets are also considered a form of active intelligence gathering.

Artificial Intelligence: The Future of OSINT?

OSINT technology is advancing, and many are proposing the use of artificial intelligence and machine learning (AI/ML) to assist OSINT research.

According to public reports, government agencies and intelligence agencies are already using artificial intelligence to gather and analyze data from social media. Military organizations are using AI/ML to identify and combat terrorism, organized cybercrime, false propaganda, and other national security concerns on social media channels.

As AI/ML techniques become available to the private sector, they can help with:

  • Improving the data collection phase—filtering out noise and prioritizing data
  • Improving the data analysis phase—correlating relevant information and identifying useful structures
  • Improving actionable insights—AI/ML analysis can be used to review far more raw data than human analysts can, deriving more actionable insights from the available data.


Here are some of the most popular OSINT tools.


Maltego is part of the Kali Linux operating system, commonly used by network penetration testers and hackers. It is open source, but requires registration with Paterva, the solution vendor. Users can run a “machine”, a type of scripting mechanism, against a target, configuring it according to the information they want to collect.

Main features include:

  • Built-in data transformations.
  • Ability to write custom transformations.
  • Built-in footprints that can collect information from sources and create a visualization of data about a target.


Spiderfoot is a free OSINT tool available on Github. It integrates with multiple data sources, and can be used to gather information about an organization including network addresses, contact details, and credentials.

Main features include:

  • Gathers and analyzes network data including IP addresses, classless inter-domain routing (CIDR) ranges, domains and subdomains.
  • Gathers email addresses, phone numbers, and other contact details.
  • Collects usernames for accounts operated by an organization.
  • Collects Bitcoin addresses.


Spyse is an “Internet assets search engine”, designed for security professionals. It collects data from publicly available sources, analyzes them, and identifies security risks.

Main features include:

  • Collects data from websites, website owners, and the infrastructure they are running on
  • Collects data from publicly exposed IoT devices
  • Identifies connections between entities
  • Reports on publicly exposed data that represents a security risk

Intelligence X

Intelligence X is an archival service that preserves historical versions of web pages that were removed for legal reasons or due to content censorship. It preserves any type of content, no matter how dark or controversial. This includes not only data censored from the public Internet but also data from the dark web, wikileaks, government sites of nations known to engage in cyber attacks, and many other data leaks.

Main features include:

  • Search on email addresses or other contact details.
  • Advanced search on domains and URLs.
  • Search for IPs and CIDR ranges, with support for IPv4 and IPv6.
  • Search for MAC addresses and IPFS Hashes.
  • Search for financial data such as account numbers and credit card numbers
  • Search for personally identifiable information
  • Darknet: Tor and I2P
  • Wikileaks & Cryptome
  • Government sites of North Korea and Russia
  • Public and Private Data Leaks
  • Whois Data
  • Dumpster: Everything else
  • Public Web


BuiltWith maintains a large database of websites, which includes information on the technology stacks used by each site. You can combine BuiltWith with security scanners to identify specific vulnerabilities affecting a website.

Main features include:

  • Reporting on the content management system (CMS) in use by a website, its version, and plugins currently in use.
  • Reporting on other infrastructure components used by a website, such as a CDN.
  • Providing a list of JavaScript and CSS libraries used by the website.
  • Providing information about the web server running the website.
  • Providing details of analytics and tracking tools deployed by a website.


Shodan is a security monitoring solution that makes it possible to search the deep web and IoT networks. It makes it possible to discover any type of device connected to a network, including servers, smart electronics devices, and webcams.

Main features include:

  • Easy to use search engine interface.
  • Provides information on devices operating on protocols like HTTP, SSH, FTP, SNMP, Telnet, RTSP, and IMAP.
  • Results can be filtered and ordered by protocol, network ports, region, and operating system.
  • Access to a huge range of connected devices, including home appliances and public utilities such as traffic lights and water control systems.


HaveIbeenPwned is a service that can be used directly by consumers who were impacted by data breaches. It was developed by security researcher Troy Hunt.

Main features include:

  • Identifying if an individual email address was compromised in any historical breach.
  • Checking accounts on popular services like LastFM, Kickstarter, WordPress.com, and LinkedIn for exposure to past data breaches.

Google Dorking

Google dorking is not exactly a tool – it is a technique commonly used by security professionals and hackers to identify exposed private data or security vulnerabilities via the Google search engine.

Google has the world’s largest database of Internet content, and it provides a range of advanced search operators. Using these search operators it is possible to identify content that can be useful to attackers.

Here are operators commonly used to perform Google Dorking:

  • Filetype – enables finding exposed files with a file type that can be exploited
  • Ext – similarly, finds exposed files with specific extensions that can be useful in attack (for example .log)
  • Intitle/inurl – looks for sensitive information in a document title or URL. For example, any URL containing the term “admin” could be useful to an attacker.
  • Quotes – the quote operator enables searching for a specific string. Attackers can search for a variety of strings that indicate common server issues or other vulnerabilities.

Open Source Investigation Best Practices

Here are best practices that can help you use OSINT more effectively for cyber defense.

Distinguish Between Data and Intelligence

Open source data (OSD) is raw, unfiltered information available from public sources. This is the input of OSINT, but in itself, it is not useful. Open source intelligence (OSINT) is a structured, packaged form of OSD which can be used for security activity.

To successfully practice OSINT, you should not focus on collecting as much data as possible. Focus on identifying the data needed for a specific investigation, and refine your search to retrieve only the relevant information. This will let you derive useful insights at lower cost and with less effort.

Consider Compliance Requirements

Most organizations are covered by the General Data Protection Regulation (GDPR) or other privacy regulations. OSINT very commonly collects personal data, which can be defined as personally identifiable information (PII). Collecting, storing, and processing this data can create a compliance risk for your organization.

In addition, if you discover criminal intent in an OSINT investigation, there may be specific legal requirements for exposing this data. For example, in the UK, exposing information that can tip off an individual under investigation for money laundering can lead to unlimited fines and prison time.

Be Ethical

OSINT relies on publicly accessible data, but the use of this data can impact people, both in your organization and outside it. When you collect data, do not only consider your investigative needs, but also the ethical and regulatory impact of the data. Limit data collection to a minimum that can help you meet your goals without violating the rights of employees or others.

Letting technology collect data or scan systems “on autopilot” will often result in unethical or illegal data collection. A key part of ethical OSINT is to ensure data collection is controlled by humans, with effective collaboration between all stakeholders. Everyone involved in the OSINT project should understand ethical and legal constraints, and should work together to avoid privacy issues and other ethical concerns.

Imperva Application Protection Powered by Threat Intelligence

Imperva provides comprehensive protection for applications, APIs, and microservices, which builds on multiple threat intelligence sources including OSINT:

Web Application Firewall – Prevent attacks with world-class analysis of web traffic to your applications.

Runtime Application Self-Protection (RASP) – Real-time attack detection and prevention from your application runtime environment goes wherever your applications go. Stop external attacks and injections and reduce your vulnerability backlog.

API Security – Automated API protection ensures your API endpoints are protected as they are published, shielding your applications from exploitation.

Advanced Bot Protection – Prevent business logic attacks from all access points – websites, mobile apps and APIs. Gain seamless visibility and control over bot traffic to stop online fraud through account takeover or competitive price scraping.

DDoS Protection – Block attack traffic at the edge to ensure business continuity with guaranteed uptime and no performance impact. Secure your on premises or cloud-based assets – whether you’re hosted in AWS, Microsoft Azure, or Google Public Cloud.

Attack Analytics – Ensures complete visibility with machine learning and domain expertise across the application security stack to reveal patterns in the noise and detect application attacks, enabling you to isolate and prevent attack campaigns.

Client-Side Protection – Gain visibility and control over third-party JavaScript code to reduce the risk of supply chain fraud, prevent data breaches, and client-side attacks.