Overview

Funda is the largest property portal in the Netherlands, carrying over 200,000 listings—some 95% of all homes for sale—at any given time. More than 30 million people visit the site each month, and over 4,500 affiliated brokers leverage Funda listings to connect buyers with marketed properties. Based in Amsterdam, the company is majority-owned by NVM, the country’s largest real estate association. Funda has been in business for 15 years and has over 100 employees, more than half of whom perform technical functions to keep the site operating at
peak performance.

As the goto site for Netherlands real estate listings, Funda was regularly being targeted by web scrapers looking to benefit from its data without paying for it. The company has invested heavily in becoming the leading authority of real estate market data in the country. It places a high value on its trust relationship with both NVM and its affiliated broker network.

The Challenges

Being an authoritative source is a key element of Funda’s brand equity—one that plays heavily into the ways in which its data can be used. Koen Peeperkorn, Director Product Development, is OK with financial institutions and the media freely using Funda’s data in their work, provided their source is properly credited. What disturbs him is unethical market researchers helping themselves to Funda’s data and publishing it under their own name.

Funda also needs to keep a close eye and block companies having business models built around the relocation process. Peeperkorn cites an example of movers scraping lead data related to listed properties within their region, and then sending marketing emails to those prospective customers. He notes that the practice is getting more sophisticated as the online property market matures.

“I came across a website that is trying to strengthen its position in specific neighborhoods. They scrape our site, show properties that are for sale in their target areas, and then sell advertising to mortgage brokers and other
vendors. This particular website is the digital publishing arm of a major insurance company.”

Historically, Funda has taken a two-pronged approach to protecting its brand against nefarious web scrapers: legal action and IP blocking.

Protecting the brand in the courts

Taking legal action was relatively simple in the small, tightly controlled Netherlands real estate market, where all property sales go through an authorized broker.

But Funda competitors were attempting to break into the market by scraping its listing data. While this is allowed under Dutch law, it limits scrapers to displaying a single picture in a very small format using 55 characters; a link to the originating site must be included. Funda successfully fought the scrapers through the courts, setting a precedent that continues to stand them in good stead today.

Homegrown IP blocking technology not effective

On the technical side, the company’s IP blocking system was homegrown. According to the 2016 Bad Bot Landscape Report, IPs are no longer a good vector for protection since 74% of bots now distribute their attacks over two or more IP addresses. Funda’s homegrown solution was also used at the application level only—not at the server level. As a result, there was little useful data available to help track and block specific types of attacks. Peeperkorn realized that as an increasing number of sophisticated scrapers entered the market, the need for proactive, targeted blocking became more important.

The Results

Choosing Imperva for its proactive and accurate approach

Peeperkorn realized that Funda needed to understand more about the source and nature of its site scraping attacks. The company initially
considered a monitoring system using Splunk. However, the limitations of having to manually set up notifications and alerts, coupled with its inability to even do basic IP blocking, made that solution a non-starter. So he turned his attention to solutions that not only offered monitoring, but also proactive defense mechanisms.

As he began to evaluate potential solutions, what stood out for him was Imperva’s ability to determine whether a site visitor is an automated browser or a human being. “It’s really important, and it’s getting harder to do,” he noted.

Imperva integration with F5 Networks and Amazon CloudFront

Simplicity of integration with Funda’s existing infrastructure was also an important consideration. Funda’s web architecture is a hybrid model: A hosted data center houses bare metal servers that run the site, to which the company has added some Amazon servers—primarily for load balancing. CloudFront serves the site content.

As a result, two Imperva virtual appliances are now installed in the data center. Everything is managed by F5 load balancers and integrated
with AWS.

Imperva Bot Management blocks competitors and unsavory web scrapers

“We initially ran in monitor mode only,” said Peeperkorn. “We can see what kinds of providers are hitting us, and then based on that information make blocking decisions using our access control list and other resources. Imperva’s ability to determine whether automated browsers are being used is a big help in making that call. And its ‘block by organization’ feature makes it easy to thwart known competitors and unsavory web scrapers.”

He adds, “If we ever move forward with other lawsuits against attackers, the data Imperva provides will likely be very valuable. Being able to use it to connect the dots helps a lot, as does demonstrating to NVM that we’re taking serious measures to protect our data.”

Filtering out the noise to make better decisions

Funda is currently rolling out a new site redesign and is leveraging Imperva for more accurate site analytics and decision-making. “In a lot of cases, analytics data is polluted by bot traffic. Imperva helps us filter out any traffic that sends a header classifying itself as a bot; in turn this helps us measure the success of our new site.”

According to the 2016 Bad Bot Landscape Report, 53% of bots can now load JavaScript. Many analytics tools, such as Google Analytics, function via a JavaScript code snippet. If bots can load these resources, they’ll end up skewing analytic tools and throwing off key business and operational metrics. “Because we’ll be filtering out the noise, we’ll be
better able to understand how real users are interacting with the site, thereby enabling us to make better decisions.”