Google Hacking involves an attacker submitting queries to Google’s search engine with the intention of finding sensitive information residing on Web pages that have been indexed by Google, or finding sensitive information with respect to vulnerabilities in applications indexed by Google. Google Hacking is by no means confined to searching through the Google search engine but can be applied to any of the major search engines.
As search engines crawl their way through web applications with the intent of indexing their content they stumble upon sensitive information. The more robust and sophisticated these crawlers become the more coverage they get of a server exposed to the web. Thus any information, accidentally accessible through a web server or a web application will quickly be picked up by a search engine. Sensitive information may be on the personal level such as security numbers and credit card numbers and passwords, but it also encompasses technical and corporate sensitive information such as client files, the company’s human resources files, or secret formulas put accidentally on a server. Additionally the search engine picks up information that may expose application vulnerabilities such as error messages contained in the server’s reply to the search engine’s request, directory listings and so on. All this sensitive information is available for anyone to see through the appropriate search terms.
Although the coined term highlights the giant search engine Google, we consider the domain of this attack to include all available search engines, including Yahoo!, Ask.com, LiveSearch and others.
Real-life examples of data leaking onto the Web and found by Google include SUNY Stony Brook where the personal information of 90,000 people was jeopardized when the information was mistakenly put on the Web, Jax Federal Credit Union where information was picked up by Google from a Web site belonging to JFCU print service provider, and the compromise of the personal details of several thousands residents by the Newcastle-upon-Tyne city council.
Different resources exist which provide effective terms to use for Google Hacking. Probably the most renowned source is Johnny’s I Hack Stuff Google Hacking Database which contains a comprehensive list of terms used to search the Web for files containing authentication credentials, error codes and vulnerable files and servers and even Web server detection.
Furthermore, Google Hacking may also be used as a tool for fast proliferation of malicious code. The famous SantyWorm defaced Web sites by exploiting a certain PHP vulnerability. The SantyWorm spread to vulnerable machines by searching Google for such machines and infecting them.
Search Engine Hacking Prevention:
Unfortunately, once sensitive information is available on the Web, and thus available via a search engine, a professional information-digger will most probably get his or her hands on it. However, there are a few measures one can easily apply to prevent search engine related incidents. Prevention includes making sure that a search engine does not index sensitive information. An effective Web Application Firewall should have such a configurable feature – with the ability to correlate search engines’ user-agent or a range of search engines’ IP addresses with patterns on requests and replies that hint of sensitive information, such as non-public folder names like “/etc” and patterns that look like credit card numbers, and then blocking replies if there is a chance of leakage. Pattern lists may also be found at Johnny’s I Hack Stuff resources.
Detection of sensitive data appearing in a web search includes periodically checking Google to see whether information has leaked. Available tools with just that task in mind may be found on the Internet, such as GooScan and the Goolag Scanner.