GuideStar is the world’s largest source of information on nonprofit organizations. The service gathers and disseminates information about every IRS-registered nonprofit organization — information about the nonprofit’s mission, legitimacy, impact, reputation, finances, programs, transparency, governance and more. GuideStar aggregates data which is provided by the nonprofits themselves, as well as information from other sources.
A key concern for GuideStar is protecting its valuable data from malicious bots and scrapers. The company’s success and reputation as the leading source for data on nonprofit organizations depends on it.
Increased bot and scraper activity was causing brownouts on GuideStar.org
As GuideStar grew and fed its extensive database, IT noticed nefarious bots and scrapers were impacting users and the GuideStar platform, and in some cases, bringing it down. “We were getting lots of alarms and alerts, and the site was going down,” said Shane T. Ward, Data Architect at GuideStar.
We have a 99.7% uptime guarantee which means you can only be down a few hours each year,” said Ward. Over time, the web scraping became out of control. Ward learned of several specific instances in which people posted advertisements on freelance.com and similar sites looking to hire scrapers to specifically target the GuideStar site and its data. “It takes a lot of time and effort to curate and disseminate our data, and we needed to take immediate action to protect it,” he said.
Existing bot prevention appliance was insufficient for blocking bad actors
According to Ward, GuideStar had installed a SilverSky appliance in the company’s datacenter in Richmond, VA. “We worked with them for several months saying, ‘Hey, we’re still getting scraped.’ We would have outages depending upon the load the scraper was creating on us. SilverSky thought that they could manage it, but it wasn’t working.”
Ward said brownouts were still occurring about once a month. Clearly, the sledge hammer method of blocking IP addresses simply wasn’t adequate. “The appliance was only doing packet inspection, looking for code takeovers and SQL injections,” he said. “We could block the bad IP address, but at that point, you’re just chasing ghosts.”
IT was spending 20 hours a month fighting bots
The GuideStar IT group was distracted by fighting bots, and often spent 20 hours or more on the problem each month. Still, they were unable to contain the bad bots. Ward remembered, “The technologist in me was getting up, responding to alerts in the middle of the night, to find out some automated scraper was pounding our site, and our servers could no longer respond.”
Developers were frustrated, and losing sleep
At one point, a big spike of automated bots came in. “We had to reroute the apps to parse laws and identify IP addresses, so we could block them either as an IIS server or through the SilverSky appliance,” said Ward. “Frustration bubbled up, and one of GuideStar’s lead developers wrote a very compelling letter to upper management insisting action be taken to stop the bad bots.”
The bot problem was a threat to GuideStar’sstrategic model. “I wanted to make sure the time, effort and money we’ve spent curating and creating GuideStar’s tools and data was being used for the purpose we intended, and not being stolen and used in competing products,” Ward said.
“We needed the ability to block not just by IP address, but by the signature, as well,” said Ward. Once the signature is identified, the bot can be stopped, but other people from the same IP address who aren’t using an automated script can still have access. “There may still be legitimate traffic from the IP address, so we don’t want to block it entirely,” he said. Any solution would need to both block bad bots, and enable good bots and humans to get through.
Ward was also looking for a vendor that offered a self-optimizing solution. “We wanted to eliminate delays and costs associated with updating rules,” he said. “Real-time updates are critical; if we have to wait, it’s not going to work. The horse is out of the barn, as they say.”
Finally, ease of implementation was important to the team. “We didn’t want to have to go through hoops installing appliances,” he said. “We needed to be able to start up the service quickly and painlessly, and enable real-time administration through a simple, intuitive user interface.”
Cloud CDN deployment option reduces overhead and implementation time
Ward said Imperva’s cloud-based solution not only reduced the cost of buying, installing and maintaining an on-premise appliance, but it also provided better bot prevention. “One appliance-based solution we evaluated essentially funneled traffic to a datacenter, then ran an analysis and applied rules 24 hours later,” he said. “It addressed the bot attacks after the fact. Imperva, on the other hand, sits in front of the traffic, and blocks bots from entering the site in the first place. This all happens in real time, with no delay.”
“We definitely like Imperva’s reverse proxy method. We like that the CDN was part of the solution,” said Ward. Implementation was fast and seamless, because there was no need to install hardware. “Turning on Imperva Bot Management was a simple DNS change, and it was up and running, monitoring our traffic.”
Easy to administer and provide access to different departments
Ward likes the cloud-based portal, which provides a user-friendly interface and easy administration. “Through the portal, I set up an accounting profile so they can go get the invoices when they want. Business intelligence enables users to pull down whatever metrics they want. That’s how easy it is. And if we have any questions, the support team is right there, backing us up. It’s been a fantastic experience.”
Imperva’s partnership was key to success
Ward wanted to work with a vendor that would be a partner to GuideStar in support of its mission and business model. “We’re partners in the same game,” he said. “The support we received right out of the gate from Imperva confirmed that we’d made the right decision. They’re with us every step of the way. Their passion aligns with ours as a nonprofit organization—we’re both driven toward our missions.”
Website stabilizes and recovers 99.7% uptime
After Imperva Bot Management was implemented, website performance improved immediately and the GuideStar team was able to get more sleep. Ward reported, “We’re boasting 99.7% uptime on all of our products and services now, and we have a handle on what was actually going on. We knew we were being scraped, but we didn’t know the extent to which we were being scraped. It has really, really opened our eyes — WOW, there was a lot going on! We had no idea how pervasive the problem was until we started blocking all the threats.”
Bad bots are blocked while good traffic gets through unhindered
“Imperva Bot Management protects us against any type of automated means of accessing and potentially stealing data off the website,” he said. “With Imperva, we can drill down on the IP addresses affected, issue a CAPTCHA and shut the bot down. We can fingerprint the bot and stop it, without blocking the entire IP address.” The company recently caught Mobius365 trying to scrape data to create feeds for other people’s websites.
More visibility brings peace of mind and enables intended customer use
Ward said the team now has more visibility on what traffic is legitimate versus bot-generated. “We definitely get more sleep around here,” said Ward. “Before Imperva, we knew we were being scraped, but we didn’t know the extent to which we were being scraped. We have a handle on that now. And, we know how to prevent it.”
Imperva Bot Management also provides Ward with insight into threats by organization. “We have a way to validate who’s using the service in the way it was intended to be used, and we can monitor how people are using the information.” For instance, universities often provide students with access to the data. Now, GuideStar can grant them academic access. “Imperva gives us the insight we need to serve our users’ needs, without getting into trouble with bad actors,” he said.
Time spent parsing logs and fighting bots decreased by 20 hours a month
Downtime has decreased significantly, according to Ward, now that Imperva Bot Management is protecting the site. He has seen a drastic reduction in time spent parsing logs and fighting bots, as well. “We are saving about 20 hours a month, using Imperva,” he said.
Ward values his team’s relationship with Imperva, viewing them not as a vendor but as a partner, helping to fulfill its mission. “GuideStar is a unique nonprofit, because we’re technology focused, and we aggregate data about other nonprofits to help revolutionize philanthropy and advance transparency.” he said. “When we look for vendors, we look for that same passion and alignment with our vision. I can’t say enough about how Imperva aligns with our mission, our vision and our values.”