In today’s global information economy an ever-increasing amount of sensitive data is collected, used, exchanged, analyzed, and retained. And with that comes an ever-increasing number of accidental or intentional data breaches. Identifying inappropriate access to data is paramount in stopping a breach.
User behavior analytics (UBA) solutions are often used to detect insider threats and targeted attacks by analyzing behavior to find anomalies that indicate a potential threat. While viable in many circumstances, traditional UBA solutions can miss the mark when it comes to understanding the complexity of data access to identify not only insider threats from compromised users, but also those stemming from careless and malicious ones.
Drew Schuil, Vice President of Global Product Strategy, is back again for Whiteboard Wednesday. This time he’s talking about different approaches to UBA, the challenges of using traditional UBA solutions to detect and protect against inappropriate access to sensitive data, and a recommended best practice architecture we’ve seen work with customers around the world. Watch to learn more.
Welcome to Whiteboard Wednesday. My name is Drew Schuil, Vice President of Global Product Strategy at Imperva. Today’s topic is, “Where UBA Falls Short,” and specifically we’re talking about sensitive data access. Let’s get started.
Click to enlarge image.
Common Approaches (and Challenges) to UBA
I’m going to start with UBA approaches. I think it helps to frame the conversation a little bit.
UBA Approach: UBA On Top of SIEM
The most common UBA approach in terms of deployment and type that I’m hearing about is basically UBA—or user behavior analytics, or as Gartner calls it, UEBA (user and entity behavior analytics)—sitting on top of a SIEM. Essentially, the use case here is that SIEM has failed to deliver on being able to accurately provide actionable data and alerts when looking for things like insider threats, and APT (advanced persistent threats), and those types of issues that organizations are dealing with.
UBA has come along basically applying machine learning and data science to automate and provide much more granular correlation in an automated fashion across all these different variables. Again, the most common use case I see is UBA sitting on top of SIEM to fix SIEM or do what SIEM was not able to do natively.
Broad Machine Learning Algorithms Don’t Provide Deep-Level Context
[What I often hear] is, “Look, we want to be able to implement one UBA that can solve the problem across all of our different feeds coming from our firewall, our IPS, our antivirus, our various APT and endpoint solutions. Maybe monitor our active directory for log in, log out, and lateral movement type detection, then taking logs from all different sources, from web servers, from database servers, from everything else, and feeding that in this SIEM and then having UBA do its thing on top of it.”
The challenge with this is frankly UBA and machine learning are not magic. When we look at solving [the problem] in this [broad] way, we’re basically an inch deep and a mile wide. We’re basically saying that the machine learning needs to be taught or tuned. The algorithms need to be such that they can understand not only this firewall, so maybe it’s Palo Alto, or Check Point, or Juniper, [but also] all the different variations of firewalls and those logs, and how they produce data, and what they see. Again, multiply that across all these different log sets.
The next objection or thing I hear as well is Splunk for example…it has a common dataset that is solving that problem. Now you’ve solved one problem, which is inconsistent data, but you still don’t have the context, and the deep understanding of what that data is.
Where I see UBA solutions winning is where they try to tackle one problem. They’ll say, “Look, we’re going to really focus on lateral movement and understanding account takeover type situations.” Or, “We’re going to look at outbound traffic to look for data exfiltration.” Basically, the horse is already out of the barn and is leaving the yard.
Database Performance and Availability Issues
What we’re going to focus on today is more in this section [logs], is looking at sensitive data…really drilling down into understanding behavioral analytics and how users are interacting with sensitive data. Taking this one step further, looking at the challenges of a UBA sitting on top of the SIEM as it relates to sensitive data, let’s say database logs, file logs, cloud access logs, the number one issue—and I’m going to focus on databases because this is where most organizations are starting—availability.
Databases are mission-critical systems and the DBAs job is to make them go fast, to be highly available all the time, and basically to not impact the business. When we start implementing logs to feed into this type of an architecture, the challenge is that we’re impacting those database systems. If you’ve ever gone to the DBAs and asked for logs, they’re going to start with giving you a very small amount of logs. The more you ask for, the more resistance you’re going to get.
The most common issue is those logs are native logs. They’re logs that are being created and stored on the database system. The more logs you ask for, the more hit you’re going to see on the database [performance]. We see sometimes up to a 15% or 20% hit on the database itself.
The other issue is that you’re not going to get the full data you need. It’s not going to be the rich type of data that you’re going to need in order to really leverage the full power of machine learning in UBA. This is part of the problem we see in this type of architecture, again, when we’re trying to solve this sensitive data access problem.
The second issue is cost. Many SIEMs, I’ll take Splunk as an example because we see it a lot, Splunk charges by data volume. The more data that I throw over the fence to the SIEM, the higher my cost is going to be. That leads to cost predictability issues, particularly when you’re budgeting for the year ahead. The other issue is that it creates a lot of volume, so now we’ve got storage, additional volumes of storage that may have sensitive data that I also need to look at protecting. It creates some issues not only in terms of cost, but also complexity.
Tuning Takes Resources
The next section is tuning. You may have heard of supervised and unsupervised learning if you’ve looked at the behavioral learning space. Supervised learning basically means that someone’s going to come in and they’re going to do tuning. A data scientist, either from the vendor, or from your internal staff, is going to come in and they’re going to apply their expertise to tune the machine learning algorithms to make sense of all this data. If you think about it—and you’ve got an organization with lots of different variety and lots of different data feeds—you can see why it would be required to have a professional data scientist come in and do that tuning. It puts us back into where we are with SIEM, where we’re writing advanced correlation rules, and [having] to do supervised learning. While maybe it’s a little bit better, you still have some oversight.
Unsupervised learning, on the other hand, is basically where the vendor, let’s say Imperva, is creating the machine learning and able to [automatically] tune those algorithms down very tightly so you don’t have to have professional services, or additional data scientists, or additional tuning come in, because we’re [using unsupervised learning] to tune those algorithms very tightly to get the expected outcome. As you can imagine, in doing so, it helps to have one use case that you’re trying to solve for so that you’re very focused on that, so you don’t get spread too thin and too wide.
No Blocking Capability
The last thing that I hear is no blocking. In this architecture, basically we’ve got logs, basically passive logs, sometimes real-time, typically near real-time, or looked at later, fed into a SIEM, and then those logs are fed into a UBA. Naturally, this is not a real-time blocking type of architecture. The other issue is now I’ve got to write scripts, or I’ve got to write followed actions now to each one of these blocking devices, which creates additional customizations, which need to be maintained and tuned over time. Again, this architecture is the most common that I see in the customers that I talk to. These are some of the challenges.
Recommended Best Practice Architecture
This is just an illustration of a best practice architecture that we would recommend instead. Essentially, what we’re doing is capturing that data in a much more elegant way. In a way, basically using very lightweight agents or network monitoring on the database itself, something that we’ve been doing for auditing and security purposes for 14 years.
The idea is instead of 15% to 20% hit on that database, you’re going to see less than 1%. Very low percentiles. You’ve got pressure-release valves, if you will. If the database becomes too hot or overburdened, we can turn off the monitoring so that performance and availability takes the priority. The idea is to be able to monitor not just bits and pieces, but to be able to monitor all the database traffic without impacting that system, and to be able to send that large amount of data in such a way that, again, I’m not creating volume issues. I’m not creating storage issues. We’re doing compression. We’re taking large amounts of database data and converting that into metadata that we can use to do the analytics on.
The end result is to apply machine learning to the data that we monitor from the database and send that to the SIEM, instead of sending thousands and thousands, or millions, of logs…so now we’re talking about maybe five or six actionable alerts per week. We’re seeing this in very, very large environments, so that when the SOC team looks at this, and tries to make sense of it, and maybe write some additional correlation rules, they can do it in a more effective manner.
UBA Approach: Build In-house
Let’s talk about a couple of the other approaches that I hear. We’ll skip to number three, build in-house. Build versus buy, right? I’ve heard from some organizations that have data scientists on staff, or would like to bring them on staff and do some of this in-house. The main issue with this approach is pretty obvious—it’s time-to-value. Buying an off-the-shelf solution that’s already been through and been tested in other enterprises is going to save you a lot of time, especially if you’ve got tons of other priorities and other projects on your plate. While I do see this come up quite a bit with organizations that have a lot of people, that have a lot of expertise within the data scientist world, it’s really not practical for organizations that need to move quickly to protect their sensitive data.
UBA Approach: Focus on Compromised Accounts
Let’s come back to number two, talking about compromised accounts. I hear a lot from organizations that are focused on compromised accounts and they’re implementing a variety of different types of solutions, whether it be an APT-type solution, or advanced antivirus, a next-generation antivirus. At the end of the day, the problem with this is it misses two key categories of users that have access to sensitive data, and that’s malicious and careless [users].
Security Gap – Careless and Malicious Users
If you look at careless users, these are careless or negligent, privileged accounts, or users that have access to things like databases. They’re borrowing service accounts. They’re going around the normal change control in order to do their job, which can lead to risks and weaknesses within the organization. To uncover careless behavior you really have to have a deep understanding of, for example, database logs. If you’re trying to go too far and too thin, [careless user activity] is one of the categories that you’re not going to understand because it doesn’t show up as something that’s an attack. It’s something that’s careless, that needs to be addressed as part of good security hygiene.
The last category, malicious, is probably the hardest because [malicious] users know where all the traps are. They know how to get around them. It’s very difficult when you’ve got an advanced, malicious user in order to be able to detect those. That requires this deep knowledge, being able to go past the inch deep and mile wide, to really have a deep understanding of the problem that you’re trying to solve.
The UBA Data Breach Kill Chain
When we’re talking about sensitive data, and we’re talking about database systems, it helps to look at the kill chain. I’ll briefly explain this UBA data breach kill chain where we’ve got reconnaissance, lateral movement, data access, and exfiltration. The idea here is if you can detect and stop a data breach at any one of these sections, then it’s a success.
Reconnaissance and Lateral Movement
If you look at the types of user profiles I just talked about, most of the security industry is focused here on recon and lateral movement. For example, they’re focused on monitoring, let’s say, log in and log out to active directory, and then what happens after that, so lateral movement within the organization.
I see this very commonly with UBA discussions. They’re focused on lateral movement. The problem is for careless and compromised users, they’re not going to be detected in either of these stages because they’re already legitimate users on the network. They don’t need to do reconnaissance. They’re not going to use account takeover or account [credential] borrowing, necessarily, to demonstrate or trigger an alert around lateral movement. We really need to have a deep understanding of data access in order to address these two other user categories of careless and malicious, which, by-the-way, will help us also address compromised users.
If you’re focused here on exfiltration, again, the horse has already left the barn. It’s leaving the yard. Most exfiltration type approaches with UBA are focused on data that’s going out of the firewall, and are looking at flows, and not necessarily the context of the actual data itself. If they’re taking a hundred rows out of the database per day, it’s very low volume. It’s not going to trigger something that’s looking for exfiltration at the firewall, for example.
Argument here is that if you’re focused on sensitive data protection and using UBA to solve this problem, you really need to focus on the data access layer. You need to focus on protecting the data where it lives, having a deep understanding of how users interact with the databases, the file systems, cloud repositories, big data, these other places where you’ve got huge data repositories. You really need to have a deep understanding of data.
If you’re looking to say, “Well, which UBA should I invest with first?”, because chances are you’re going to have to have a layered approach and multiple solutions, our argument is you need to start with data access. Because you can address all three use cases here, whereas you’re not going to be able to address careless and malicious users if you’re only focused on compromised accounts.
I hope this session was useful. Please join us for another on Whiteboard Wednesdays.