Manta stops web scrapers, eliminates brownouts and simplifies their move to the cloud with Imperva

Overview

Born as an online directory for small businesses, Manta has become the most highly trafficked destination to find services and products from local businesses across the U.S. About 30,000 business owners join Manta each month to gain visibility and showcase what they have to offer.

According to Russell Garrison, DevOps Engineer at Manta, protecting the data of its small business customers is always top of mind. “Our problem is that we hold a lot of valuable information in one place, which is great for our members, but a pain point for IT,” he said. “We’ve migrated our entire platform to the cloud. But keeping information safe and preventing bad actors from damaging the site or stealing the data we store is always a big concern.”

The Challenges

Online Directories Entice Web Scrapers

Like many online directories, Manta is an attractive target to web scrapers due to the quality and quantity of the data it stores on U.S. businesses. To help combat web scraping by malicious bots, Manta’s IT team had been using an appliance from FireBlade (formerly SiteBlackBox), which ran on a virtual machine in its datacenter. “In the course of migrating our web infrastructure to Amazon, we encountered a lot of transitional issues. The appliance wasn’t in-line, and we were very concerned about site availability. Eventually we decided to shed in-house responsibility for maintaining anti-scraping infrastructure,” said Garrison. The team began evaluating cloud-based solutions to replace the appliance.

According to Garrison, the FireBlade solution didn’t align with company goals. “Because we were responsible for the host hardware and VM it ran on, we could only promise the machine’s availability to a certain level,” he said. “We couldn’t cluster members for availability, and we had a more complicated implementation.” Garrison said that when the web app would submit a request to the FireBlade appliance, it would time out if it didn’t receive a call back immediately, and the default action would be allowed. “If the appliance was bogged down, our site could suffer performance issues,” he said. “And, a DDoS attack on the site would render it unavailable or unresponsive.”

Few Anti-Scraping Solution Providers

Garrison’s team investigated multiple security and CDN providers, but their services did not address his problem. “Specifically, we wanted anti-scraping protection,” he said. “We talked to Limelight, Neustar and a couple of other CDN vendors, but they all focused on DDoS protection. They didn’t understand the need to protect content from automated bots. Imperva (formerly Distill Networks) was the only serious candidate that focused on web scraping.”

The Requirements

Online Directories Deploy Increasingly Complex Web Infrastructure

“Now when you run a site, everything is very interdependent. There are a lot of enabling technology and services which add multiple layers of complexity, and even more parties involved,” noted Garrison.

As Garrison and his team at Manta narrowed its short list for anti-scraping solutions, they refined their priorities:

  • Maintain high availability and performance of the site
  • Protect business information from theft
  • Easily integrate with new, cloud-based web infrastructure
  • Outsource management of the problem

Imperva Bot Management (formerly Distil Networks), which offers both an on-premise appliance cloud-based solution, came out alone at the top of Manta’s list. Since Manta was looking for a solution which would work seamlessly with their Amazon cloud platform, Imperva’s Cloud CDN was an ideal fit.

Why Imperva?

Easy setup and no need to change the underlying web infrastructure

The free trial setup was straight-forward and quick, and within weeks, Manta went through a full implementation. Manta easily configured the settings, allowing them to accurately distinguish human visitors, good bots and bad bots. They were also able to maintain their desired URL structure, which had been optimized for SEO.

Outstanding customer support and product that worked as promised

“The offering from Imperva was unique and the human aspect was huge,” said Garrison. “We had trouble with our previous CDN vendor, Limelight, fitting our case correctly, and because we weren’t a big, ‘sexy’ customer, we weren’t getting great service. With Imperva, we got quick response from Support Engineers who were eager to help us through a fast and successful implementation.”

Additionally, trust came into play. “When we selected Limelight for CDN, they promised us vaporware with features that they later abandoned.” he said. “Those features—notably anti-web scraping—were important to us. Imperva provided them out-of-the-box.”

Accurate bot detection and complete control over how to use the service

Imperva’s solution began blocking malicious bots instantaneously using several layers of proactive detection. It then learned behavior patterns unique to Manta.com and identified anomalies to a typical visitor profile. Imperva Bot Management unique fingerprint distinguished good bots (like Google and Bing) from bad ones, and blocked them. Imperva provides Manta with complete information as to what bots are doing on their site and what organizations are behind the bots, giving Manta the option to monitor, present an unblock form, CAPTCHA, or drop each bot request.

The Results

Blocked 99.9% of web scraping bots with no impact on legitimate users

Imperva Bot Management prevents web scraping bots from stealing content. Manta protects both its business and the 29 million small businesses that rely on their services every day. Legitimate visitors and content contributors have not been impacted, allowing Manta to maintain the trust and positive brand impression of those users. Imperva’s solution is more effective than the previous system, and overall costs have been reduced significantly. According to Garrison, “With Imperva, we can have our cake and eat it, too.

We’re protecting our content from thieves and we reduced our infrastructure costs at the same time. For DevOps, this was like pressing the Easy Button.”

Slashed CDN and anti-scraping infrastructure costs by over $60K annually

Since replacing the Limelight CDN and Fireblade appliance with Imperva, Manta’s DevOps team has reduced the costs of content delivery and anti-scraping prevention by over $60K per year. Also, Manta was able to eliminate the cost of infrastructure maintenance. The team now has more control over the solution with less effort and IT spend.

Increased site speed, performance and good bot traffic

“We completely migrated our application to our new cloud platform and new processes, making the site itself much more durable. It can handle a lot more traffic. At the same time, we still had the same concerns about bots damaging the availability and performance of the site, and stealing our competitive advantage,” Garrison said. “With Imperva’s Cloud CDN Deployment, our site is faster, we have high availability, and now we’re getting crawled by good bots more than ever before. Google’s hitting us really hard. If you look at the Imperva dashboard, you can see the good bots, and there’s a ton of them.”

Reduced IT overhead and increased collaboration with upper management

Since Manta is no longer battling bots in-house, they have saved the equivalent of half of a full-time IT employee in overhead costs. Additionally, the simplicity of the user interface reduces the need for highly skilled tech professionals. “The Imperva console includes a lot of powerful tools and reports that provide real-time insights to our traffic. The console is easy to understand, and provides management with a clear picture of our site traffic,” said Garrison.

Facilitated move to the cloud and improved business agility

Imperva’s infrastructure-agnostic service fit well with Manta’s move to the cloud, increasing their operational flexibility. Imperva offloads server strain and accelerates website performance by serving traffic through 17 global locations, enabling a high degree of scalability and an optimum end-user experience. “Imperva is completely distributed, and you can manage it
through a single pane of glass, making it really easy to protect a site,” said Garrison.

No more service orders or long lead times

With Imperva, Garrison’s team is more agile and can quickly respond to business needs. For example, making changes to the configuration is faster and easier. We’ve been moving domains around, and Imperva had made that an easy process,” said Garrison. “With our previous vendor, there were a lot of service orders and long lead times.” “What I really like about Imperva is that they focus entirely on malicious bots. It’s what they do. And that let’s me sleep at night knowing our site is protected from web scraping bots,” said Garrison.