[Webcast Transcript] Data Mining in Incident Response: Managing Risk and Spend through an Effective Evidence-Based Approach

HaystackID Blog | September 8, 2022

Editor’s Note: On August 31, 2022, HaystackID shared an educational webcast on the topic of data mining in data breach incident response. As data mining has increasingly become one of the largest expenses during a cyber incident, it often leaves claims professionals with blown budgets and insured clients in high-risk scenarios when assumptions about their data (and what may lurk within) dictate the operationalized response toward providing notice to affected parties after a breach.

This session was developed and shared by a team of cybersecurity, information governance, and legal discovery experts and highlights the risks of over and under notice of impacted individuals in a cyber incident. The experts also described and discussed key checkpoints ahead of and throughout the data mining process with the goal of giving claims professionals a new understanding of the types of levers they should be pulling internally at their organizations, and with their partners, to optimize, manage, and establish repeatable data mining incident response processes.

While the entire recorded presentation is available for on-demand viewing, provided for your convenience is a complete transcript of the presentation.

[Webcast] Data Mining in Incident Response: Managing Risk and Spend through an Effective Evidence-Based Approach

Presenting Experts

+ Mike Sarlo
Chief Innovation Officer, President of Global Investigations and Cyber Incident Response Services, HaystackID

+ Susana Medeiros
Associate, Norton Rose Fulbright

+ Sam Sessler
Assistant Director, Global eDiscovery Services, Norton Rose Fulbright

+ Anya Korolyov
Vice President, Cyber Incident Response and Custom Solutions, HaystackID

Presentation Transcript

Moderator

Hello, and welcome to today’s webinar. We have a great presentation lined up for you today, but before we get started, there are just a few general admin points to cover.

First and foremost, please use the online question tool to post any questions that you have, and we will share them with our speakers. Second, if you experience any technical difficulties today, please let us know in that same question tool, and we will do our best to resolve them. And finally, just to note, the recording of this session and accompanying slides will be shared via email in the coming days.

So, without further ado, I’d like to hand it over to our speakers.

Michael Sarlo

Thanks so much.

Hi, everybody, and I hope you’re having a great week. My name is Mike Sarlo, and on behalf of the entire team at HaystackID and Norton Rose Fulbright, I would like to thank you for attending today’s presentation and discussion titled “Data Mining and Incident Response: Managing Risk and Spend through an Evidence-Based Approach”.

Today’s webcast is part of HaystackID’s regular series of educational presentations, developed to ensure listeners are proactively prepared to achieve their cybersecurity, information governance, and eDiscovery goals. Our expert presenters for today’s web class include individuals and experts with a deep understanding of cybersecurity, incident response, information governance and cyber discovery, and data mining.

I’d like to start off by introducing myself. My name is Michael Sarlo. I’m the Chief Innovation Officer at HaystackID. I’m a forensics examiner by trade as well, and I also oversee our cyber incident response services division, and also our global investigations on eDiscovery, so I’m the president of both. I’m going to kick it off to Susana Medeiros to introduce herself.

Susana Medeiros

Hi everyone, Susana Medeiros here. I’m an associate at the law firm of Norton Rose Fulbright and a member of our Information Governance, Privacy, and Cybersecurity team. Our practice primarily focuses on assisting our clients, either as breach counsel in ongoing active cyber investigations, or providing pre-incident privacy compliance work, and for some of our clients, if they are unfortunately the victim of a cyber-attack, we assist them with post-event remediation and compliance work. So, we’ve really done the A through to Z of working with our clients who either want to better prepare themselves for a cyber event, who are currently facing an event, or who are trying to remediate one.

Michael Sarlo

Thanks, Susana.

Sam Sessler

Hi, everyone. I’m Sam Sessler, also part of Norton Rose Fulbright. I help manage the firm’s eDiscovery and cyber incident response teams. So, my background is more technical by trade, and I work hand in hand with Susana’s team and other legal teams across the firm.

Michael Sarlo

Thanks, Sam. I’d start by saying that Susana and Sam are two of the best people in this field, and we’re super lucky to have them today. We work with many different outside counsels, and Norton Rose is really leading the pack on really innovative solutions in this field, so really happy to have them, and of course, my colleague and one of the smartest people I know, Anya Korolyov, you’re up next.

Anya Korolyov

Thanks, Mike. Hi, everybody, Anya Korolyov. I’m the Vice President of Cyber Incident Response here at HaystackID. I’m an attorney and also Relativity Master, and have a technical and legal background that allows me to work very closely with the breach coaches and corporations to mitigate their post-cyber incident response, as well as how to do the information governance prior to a potential cyber incident.

Michael Sarlo

Thanks, Anya. So, our agenda today is going to start with a little bit of a small history lesson and update on where we were at in 2020 in 2021 as far as the cyber risk landscape in general, as opposed to where we’re at in 2022 and where we appear to be going, which looks like a very busy final quarter here. We’re going to dig in a little bit to some of the trends and statistics, just to give you guys a sense of the size and scale of this problem. We’re going to jump in a bit by sector fairly quickly, and then just close out, I think, with some discussion points from folks in the weeds who have been living this, who I think have their finger a bit on the pulse of the marketplace, and when I say marketplace, I really mean the threat actor marketplace and the way we interact with them, and the statistics they might actually convey.

We’re going to switch over then and start to talk about what triggers an event from being a simple network perimeter compromise, as opposed to something that actually requires significantly more action on the legal front as it relates to assessing risk. We’re going to get into a little bit about how geography matters. It’s a big question, I think, that comes up on the carrier side, on the client side, and even for law firms who are new, who are dabbling in this space.

We’re going to really start to talk about then, at that point, why an evidence-based approach to data mining, and for some of you folks who are born and bred in eDiscovery, data mining is really the term that is used more in the incident response and cyber insurance world as it relates to human review, for collation and extraction of sensitive data points, getting towards a notification list.

And then we’re going to really get into the nitty-gritty of the technology, the workflows, and really taking you through an end-to-end investigation and response, talking about the tools we use, the technologies, what’s the difference between an AI-enabled approach or how we vet those.

And then finally, we’ll close out with how you measure success in these matters. Because the bar can be all over the place matter by matter, and it’s really important to be able to wrap things up, and to communicate those, as many of our clients are really looking to get more of a program as it relates to handling these.

So, what we are seeing actually in Q2 and in the second half of the year is a significant decline in the amount of victims that we saw. When I say victims, it’s somebody whose personal information is compromised, as opposed to 2021. So, in Q2 alone, we had about 413 reported compromises, and that’s a key note here, as I think that everybody here would probably agree, that a lot of organizations just aren’t reporting when they’re having an incident. They may be smaller, they may be a non-regulated entity, they may be operating in a country where the laws aren’t as defined, nor is there regulatory risk. So, these statistics can be a little bit misleading, but about 404 known data breaches in the sectors that do report in North America, in the EU, and UK, and in APAC for developed nations. And then we actually saw six data exposures, which it’s very embarrassing to be the IT guy who accidentally posts everybody’s passwords on the internet. About 817 total, a fairly even number, but we’re significantly down in 2022, in some ways. And what we saw in 2021 was almost 300 million folks had their personal data compromised, just a staggering number. Actually, a little bit down from 2020, and still even slightly down from some of the mega breaches that we saw with the credit reporting agencies, and those events, which resulted in a massive amount of sensitive data being leaked out on the internet. I’m sure many of you have iPhones. I know that my passwords tend to get compromised on a weekly basis because I always get a little alert that you’ve popped up there. So, a huge problem.

What’s happened a little bit here is we’ve seen a shift and 2021 and 2022. The targets started to be focused on what would call most third parties in the threats supply chain. We started to see financial services organizations. We started to see Big Four consultancies. We started to see law firms get hit. The risk effect in this scenario where so many customers are involved typically results in a much higher ransom, but it’s not all about a ransom payment. What we’re also seeing was the rise of more types of extortion and exfiltration of data. Large organizations that were the big game that tended to be hunted in 2021, and we’ll talk about that because that’s shifted a bit, tend to have the systems in place to actually restore their networks and their critical infrastructure rather quickly. The IT landscape is so much different even than it was three years ago, as it relates to accessible backup. So, very large-scale organizations there are usually doing pretty well, and as a result, threat actors here really started to operate in two phases. They’d encrypt your data, but they also would exfiltrate your data, and if you don’t pay them, they’re going to post that data on the dark web in highly accessible forums, or they’re going to monetize your network even further to the threat actor community, which is a highly developed global crime business. Probably one of the most that’s ever been developed, and we look at global crime gangs from a sophistication standpoint.

In general, we talk a lot about ransom. We saw things shift at the start of 2022, and one of the primary influencers here, I think, was the start of the Ukraine war, and I’m going to try to put this back one slide for us, and some major events that occurred in the ransom marketplace. In 2020, we saw this gang Conti, which is a highly Eastern European-focused cyber gang, and we talked about what we’re seeing in the cyber world, it’s cyber warfare. Many of these highly developed threat actors, be it publicly known or not, there’s reason to believe that they’re sponsored by nation-states, and there’s a high concentration in Eastern Europe with a direct hand from Russia. We oftentimes see attacks coming from North Korea, and from China. Though, oftentimes, in Asia, it’s oftentimes focused on stealing intellectual property, where maybe even a ransom event or cyber compromise isn’t necessarily as much about monetization.

So, what we saw at the start of the Ukraine war was a fracture inside of the Conti group. There was a massive leak when Conti came out in support of Russia. They had a lot of folks in Ukraine and other Eastern European nations. We saw a massive data leak of all of their chats, and you saw the gangs start to fracture into smaller parts, and part of that came to a head where we started to see a shift in resources of Conti-branded ransomware now being used against the Russian state, and a major focus of threat actors, or we’ll call them hackers in this scenario, taking a side. The risk for global miscalculation on the warfare front also increased. At the end of 2021, we had some major attacks on our supply chains on major businesses here in the US – food, oil, healthcare here in the EU, and in the UK, and in the US. It’s important to note that healthcare organizations always tend to face the highest cost in this scenario. The average cost of a stolen record, as of 2021, for a healthcare entity was about $458, as opposed to a non-healthcare organization at about $150, and they’re especially vulnerable to third-party supply chain issues with the myriad of suppliers who are involved in the delivery of healthcare services.

What we then saw, really, around May 16th’s massive attack by Conti on Costa Rica’s government. I think they made their way into over 27 government institutions, truly an act of war. It really didn’t get much out of it. It was a publicity stunt to announce the disbandment of Conti, and what we saw now was somewhat of a calm in the ransom world, which has now picked up fairly significantly, over the past 30 days, as we’re starting to see a couple of new groups appear, HiveLeaks and Black Basta, which is a rebranding of some of the key players in Conti, and some new ransomware that has taken front and center in the ransomware from a software standpoint called LockBit 3.0 with the highly sophisticated development team. What we saw before that was a lot of individual threat actors who were operating on their own smaller gigs. What we had in 2021 was at least repeatability as far as our adversaries were concerned. We knew that if we paid them, more often than not, we would get our data back. You’d also even get a nice little report that would show you how they got in. That started to evolve in 2022. As new threat actors entered into the marketplace, organizations didn’t have as much of a playbook as far as how these would proceed.

Sam and Susana, you guys are obviously in the weeds on this, you have a massive global practice. Do these stats and statistics and the overall story make sense and sound true as far as what you experienced in 2021, as opposed to where the first half of 2022 ended up?

Susana Medeiros

Yes, I completely agree, Mike. I think that one thing that we’ve noticed is definitely seen less emphasis on ransomware, on threat actors locking down your systems, preventing system access, and until you pay the ransom. We’re seeing a lot more financially-motivated hackers who are extorting companies, so they are taking data, but not actually locking down any of your systems. They’re taking data and threatening to publish or otherwise release that data pending payment.

Now, I think that’s partly because that is an easier activity, and they still feel confident that they will receive payment, even when data only leaves the building, and there’s no system interruptions, and I think that’s because companies still have obligations all around the world, under data breach notification laws, to potentially notify regulators or even individuals once data leaves the building, and so I think hackers have wised up to this and realize that they don’t have to lock down your systems to be able to get a good payout.

So, I think that dovetails nicely into… So, unfortunately, you’ve experienced a cyber event, what are the things that we’re going to need to consider and assess? And I want to be very clear, as a group, we’re going to be walking through in this presentation evidence-based decision making around how do you identify the data that’s important, that you have a legal obligation to do something with or strategic interest in, and how do you do that quickly after an event? But let’s be clear. On day one of an event, you do not know very much. You may know some information about the threat actor’s attack path. You may have received an extortion demand, but still not know exactly what data may be impacted, and it can often be weeks before you’re able to even begin to assess what data has been impacted, and in the case of ransomware, you may also be dealing with business service interruptions that are further distracting your IT team as they try to stand back up your IT systems. So, many times in the beginning weeks, we have to assess potential regulatory, contractual risks before we actually know what data is involved, and so we’re able to start making some of those assessments based on the particular industry that’s involved, where you’re incorporated in the world, and oftentimes we’re typically brought in as breach counsel very early into an event partly because you can take advantage of having the investigation come up on under attorney-client privilege, and oftentimes even before you know what data is impacted, you may need to make regulatory notifications to various entities around the world. So, if you are a global entity with primary operations in Europe, that might include the ICO and your lead supervising authority. In the US, if you’re a healthcare company, it may include HHS or the New York Department of Financial Services, and so you may be making initial regulatory notifications before you know whether you actually have a legal notification obligation to any individuals.

But what I’ve been noticing in terms of trends is the specter of is personal data of individuals impacted. Will I have to have notification obligations under various data breach notification laws? That’s a key component of what’s motivating this investigation, to ensure you’re complying with your obligations, but we’re also seeing increased interest in what are your obligations to your business customers. Regardless of any personal data that may be implicated, what business-sensitive materials are at issue? Do you have a contractual obligation or strategic interest to notify customers, and at what stage of the process are you going to do that, especially when you have very little data at the front end? And so really, the roles we’ve often played before of us guiding our clients is we try to interrogate the data as quickly and defensively as possible, so that we can provide these much-needed answers that underlie potential notifications, that underlie communications to external parties, to further support those obligations.

Sam Sessler

I did want to add the timelines and pressures that are triggered, depending on the assessment that’s being made, and the information that you find can be intense, and it’s really important to lean on counsel, such as Susana, in regard to what are your obligations from a timeline perspective. When you’re dealing with an incident, it’s obviously a very stressful situation. That will be compounded with the fact that you’re now potentially under obligation to respond to specific timelines. Oftentimes, compressed.

Michael Sarlo

What we find too is many organizations, they have solid incident response playbooks that are an IT InfoSec document, but when a real event happens, the legal element is oftentimes not playbooked and not documented. Susana and Sam, I know you guys spend a lot of time with clients on playbook design and stuff that. Now, it’s a huge, important element for our corporations. It is to be considering how much spend will go into the legal response, how much effort and how that can be even more disruptive than a technology break in many situations, and also what I always tell people, the size of the data does not matter. We have cases, sometimes, where we have 20 gigabytes that’s taken, and it’s so rich and so dense, and there’s so much risk that is contained in that data, you just never know what you’re walking into. And Sam, I’ve worked many cases that are like that, and from eDiscovery, from a practitioner standpoint, as a technologist, it can really take you off your heels if you’re thinking you might be dealing with something small and all of a sudden, it’s really big. So, it’s so important to have confident breach counsel to help manage that scenario.

Sorry to cut you off, Susana. Please go ahead.

Anya Korolyov

I wanted to follow up and ask Susana and Sam also, the timeline that Sam just talked about, they’re always tied, of course, with PIIs because you want to notify the people and there are regulatory obligations, but for the business-sensitive that you were talking about, are those also as time-sensitive, if not more so?

Susana Medeiros

That’s such a great question, and I think it all depends on how public the event is. So, when it comes to your customers and external parties, you may not have a contractual obligation to notify, but if the event is very public, if there have been system interruptions, then you will be asked these questions, and you want to be prepared that when your biggest client comes to ask you, was my data impacted, how am I affected by this event, you want to be prepared, and there is this expectation that as a lot of the time this data is difficult to interrogate, you may know which systems are impacted, but not the specific data within those systems that was impacted, or at least you don’t know that right away until your forensics investigator is able to more deeply determine that. And so, even if there isn’t a regulatory timeline motivating the speed of the investigation, you may need to answer to external parties who want more information, and in order to minimize business loss, you, of course, want to be able to provide them with that information as quickly as possible, and you want it to be accurate. So, a lot of the times we say on day one, I may not know if any external third parties are impacted as a result of an organization’s event, but on day 10, I might know that a little bit of data was impacted for a particular third party, and maybe until day 20, 30, 40, I don’t know the entire universe of data that was impacted. Events, depending on the amount of data, the density of that data, are measured in days, weeks, and months, and so being able to set expectations for a defensible process is really important.

Sam Sessler

And I’ll add just quickly, the timeline mentioned when it comes to sensitive data, we’re not just talking about PII, but to the point of talking about sensitive business information, certain trade secrets, there could be an internal threat actor for all we know. We oftentimes have matters where there is an insider acting and stealing business trade secrets, and so while there’s not necessarily a regulatory obligation yet, the executives want to know, within days, what sensitive business secrets have been stolen or accessed. So, a whole nother ball of wax when you’re dealing with insider threat actors, as well as sensitive trade secret information. The timeline is always a factor, and so we’ll talk more about this later, but applying an evidence-based scientific approach is important, but also the need to respond immediately is where we come in.

Michael Sarlo

We don’t have the same standards that we have in eDiscovery and just a body of precedent for how data and technology should be used. So, it’s so important to do things, I think, in a systematic way that somewhat relies on that experience when you’re [inaudible].

One thing, too, I want to raise that’s really important and makes it very difficult, especially in the world we live right now that’s a rapid sense of deglobalization, is when you make a payment, it can be incredibly high risk. There’s quite a bit that goes into the process from a forensic side, from an investigation side, to assign attribution to who you actually may be paying a Bitcoin payment to because there’s massive OFAC issues, and I think that we oftentimes see these articles where you shouldn’t make payments, you shouldn’t make payments, but totally just implausible in almost all cases. From some of the stats we didn’t mention, it looks like about 68% of organizations who had reported a breach actually paid the ransom based on what was able to be found on the internet, from what was then publicly posted from those.

The global issues, I think, is where I think there’s some confusion, and Susana, we’d love to hear from you a little bit on that topic. How does geography matter? Is it the company? Is it where the data subjects are located? Is it where you’re doing business? What changes in a global event? How do you manage a response in the US and in the EU, or in the UK?

Susana Medeiros

Yes, such a great question because so many companies may have their headquarters in one location, but they really are global operators, and so that means that an event may affect systems that are not localized to a particular area of the world. For example, if an event ends up impacting your HR system, and you have personnel located all over the world, then the fact that you are headquartered in Ireland does not mean that that is the limit and scope of your potential regulatory obligations. And so, you likely have obligations that you’re going to need to consider in your country of – where you’re headquartered. As well as the countries where any impacted individuals who have qualifying personal data that meets regulatory requirements in those countries.

And so, we have advised global clients who end up having notification obligations around the world. And so, we recently worked with a global client who had notification obligations in four continents. And so, it wasn’t that they did all of their business in all of those regions, but rather that they had impacted individuals who in, say, Brazil or Japan or China or the Netherlands, there were notification obligations in those countries that would bring those individuals in scope and require you to notify both regulators and the individuals themselves.

So, I would really say it’s very country dependent, and so it’s very, very important that when you’re interrogating the data that you take a global approach. And you’re able to do that through conversations with the client. Where are your primary business operations? What are going to be the most likely regulatory obligations that you have? Where are individuals most likely to be located to the extent we identify personal data in the data set? And then we’re able to build a review framework that specifically identifies information in compliance with those primary laws.

And so, much of the time, applying an EU standard and meeting most regulatory obligations to identify personal information. So, that’s commonly our approach for global clients. And then there may be extra considerations if your primary business operations are in, say, Singapore. We’re going to be looking to the PDPA, Latin America, et cetera.

And so, that all impacts your review strategy and where you may have notification obligations.

Anya Korolyov

I think also to bring it back a little bit to eDiscovery, I always think geography in cyber is kind of like language identification eDiscovery. Everybody always thinks, “Oh, we don’t have anything in another language”. There are a lot of clients that think, “Oh, we only operate in the US, we’re a college, we’re a doctor, we don’t expect anything”. And every time we go in there and we do find people from other jurisdictions, other continents, other countries that do rise to that level. So, it’s always important to not make that statement to say, “No, we definitely don’t have anything”.

Susana Medeiros

That’s right. We can’t make too many assumptions – sorry, go ahead, Sam.

Sam Sessler

No, I was just going to be in agreement there. And we’ve also found that you may not be a global company, but you have employees that have personal data within your network that does not belong to the company. And it can trigger all sorts of privacy issues because they may have someone else’s data within your network that has been exfiltrated.

So, it’s not just limited to the landscape of your company or your organization. You are sometimes at risk of what your employees are bringing into your organization as well.

Michael Sarlo

And I think it’s also important, I think we’re in a much better privacy landscape as far as being able to interpret the GDPR, being able to interpret Privacy Shield 2.0, and obligations as far as getting global teams in place. I think even a couple of years ago, a lot of these matters would have been localized to the EU or the UK, now we’re seeing that our clients are more sophisticated, law firms are more sophisticated as far as getting the right framework in place to allow a set of global experts to act on this data. It’s really something that leaves your corporation on the line. There are ways to allow teams in different geographies that may have the expertise to be a part of the project team that supports these matters.

Because, I think, Sam, I’m sure you’re dealing with a slew of different vendors, especially when you get into these global events. What are some of the challenges that you face working in different geographies, even in your own firm, with the vendors, that type of stuff?

Sam Sessler

Yes, you’re exactly right. I would say in the past 60 days, 80% of the incidents that we are assisting with are cross-border global incidents. And some of the challenges are finding a provider like HaystackID that can support us globally. And it’s very important to understand that and understand not only what the vendor is capable of from a people, process, and technology standpoint, but also where can they host your data. Where do they have people? Where do they have brick-and-mortar labs and review operations located? All of that is very important and we are dependent on that from a service provider standpoint. We can’t bring data into the US, oftentimes, if it’s hosted outside of the country. So, all of those factors are very important. So, I’m glad you asked that.

Michael Sarlo

And so, now, we’re going to switch it a little bit here. There’s, I think, a couple different camps as far as how you do this. Much of the world – I learned this – is run by insurance, all the way up to Lloyds of London. And I think that what we’ve seen a little bit, even more so than eDiscovery, is a commoditization of the service stream, be it of the forensics offerings and also of the data mining offering, and all the other vendors that actually operate between there. And Sam, you should maybe talk about that as the tech section about just even some of the non-technology vendors.

But certainly, I’d love to hear from you guys, why does this need to go right? Why is it not just, “Hey, we had a breach we notify everybody”? why is it so dangerous to under notify? What types of risks are out there for organizations?

Susana Medeiros

I’m happy to start us off and then I really want to tee it up for Anya, because we’re often shoulder to shoulder on these issues.

So, I think from a legal perspective, under-notification is, of course, bad because then you’re not potentially meeting your regulatory obligations. And I have been finding in the past few years that regulators and customers are not only more honed in on cyber events, they are more interested in learning more about various incidents. But they’re asking the right questions and they will interrogate your process. They will ask you, “Well, how do you know that there’s nothing else?”

So, you’re saying that you’re notifying these people, what else was there? What else did you find? Did you also find this but you’re not reporting it?

And so, under-notification is – obviously, we want to comply with our obligations, we want to meet the expectations of third parties who may be requesting information. But also, it’s not tenable in this day and age, because you will be asked the right questions to learn more about the process.

And for over-notice, it’s really reputational. It’s really going – notifying too many people will… is, one, not necessarily regulatorily required. It can have you take a hit to your reputation and make an event seem worse than it really needs to be, and I think attracts more third-party litigation attention.

But Anya, from more of the technological perspective as we’re mining the data, what are your thoughts here?

Anya Korolyov

Yes, I think this is one of the trickiest parts of this whole process because we do have great technology at hand to help us identify where the pockets of that information is. We have the help of attorneys to go and pull that information out. But unfortunately, nobody keeps perfect records, no company keeps perfect records, no employee or patient or client appears once. Was all the information perfectly preserved in a nice little packet?

So, we end up with sometimes millions of data subjects that we have identified and in some documents, they have a social security, in others they have an address. And the trickiest part for us is to tie it all together, and make sure that the notification list, we don’t accidentally drop somebody.

I think, at some point – you always see a name and you think, “Well, there’s only one person by that name”. I think at some point I actually went and looked online how many people have my name, which I do think is kind of sort of unique. It was quite a lot of people. So, you can never make an assumption that just name alone is enough or even an address. We have people that are juniors, seniors have to be very careful that you don’t under notify and you don’t misrepresent the smaller number in that sense. But then you also don’t want to send 10 letters to the same person. Obviously, nobody loves to get that letter. We all I think got it at this point, or if we haven’t, we will sooner or later. And getting five of them at the same time would probably be way more upsetting.

So, you have to be very careful and you have to build the rules of how you’re going to deduplicate, so to speak, all this information that you have acquired and build your notification letters. And for us, that is one of the biggest challenges where Susana and I would work closest together, and run it several times to make sure that the result that we’re getting not only satisfies the legal requirement but satisfies the technical side of it as well.

Michael Sarlo

I think that’s a really good segue, and I always like to start with this when we start talking about technology with any audience.

A lot of the terminology we use here is a little bit different. When we talk about deduplication in eDiscovery, we’re usually talking about deduplicating documents at the family level by MD5 hash. When we talk about deduplication here, we’re really talking about all of the data points about impacted individuals or entities that’s collated and extracted out of documents, sometimes it can be millions of rows of data that needs to be deduplicated, organized, and rolled up.

So, maybe the 10 different versions of Mike Sarlo in a data set finally only appear once with my proper address and a listing of everything that’s been compromised. It’s an incredible technical process that requires really big data handling.

And I think, Sam, that’s a great segue for you here. You’ve been in the weeds at the vendors. You started at PricewaterhouseCoopers. You’re a major leader, overseeing Norton Rose’s entire eDiscovery offering. I think you’ve seen it all. You’ve got a great sense of the technology.

What are some of the differences here between the standard eDiscovery process and how has the tech changed, and how have your needs changed to respond to these matters even over the past 24 months?

Sam Sessler

Yes, a really good question and good points. There are a lot of disciplines that translate over from eDiscovery to cyber, the same with forensics. And I think the technologies are starting to catch up in terms of what is required for incident response and data breach.

The market could still use some innovation in this area. I’ll have to admit that it can always be better. But I think the last half-year has been really impressive what the Haystacks of the world and other providers as well have started to develop for this particular niche.

And some of these components listed here on the slide, AI, search terms, analytics, RegEx, all of these things are translatable from eDiscovery, but they have been tailored uniquely now for data breach. For example, density analysis, it’s not just what type of PII or PHI you’re finding, but how much is it? And where do you focus from a prioritization standpoint your efforts on?

So, this technology is starting to focus on what is my impact overall. And then how dense is it? And how do I focus on the most important data first? How do I prioritize? So, there’s a lot of good things coming into the market, Haystack’s done an excellent job. I think in combination with Norton Rose Fulbright, we’ve been able to really refine the process.

And going back to what Susana and Anya were describing earlier, notifying appropriately without under-notifying, and doing it in a way that is cost-effective is the key. And these technologies that are emerging, some of these have become a little more mature now, but they are the key.

On top of that, I’ll say that the people behind the wheel are also very important. There’s a lot of experts that still are required to run this process. It’s not fully automated yet, and I don’t know if it ever will be. But having the people like Anya, like Mike to execute the process and know the end goal, it’s not just about getting to the finish line, but it’s about getting there effectively using an evidence-based approach, and doing it in a way that is not extremely expensive.

Michael Sarlo

I think from the tech side, and where it’s so important is repeatability. We went out to market (HaystackID) and evaluated every tool that was claiming to be in this space, and what we found it was really rather a black box, especially when it came to deduplication or having workflow built into audit and validate and to quality control the efforts of really large teams that support these matters. Sometimes we have 200, 300 people working on it, and it’s never perfect, you need to get that audit trail.

And I think that’s getting better, and I think really is important as it relates to the defensibility piece is just making sure it’s repeatable. It’s the old classic Daubert standard that even in digital forensics we try to bring forward in everything we do in eDiscovery so that we can reproduce that result.

And I think a lot of that – Anya, you deal all day, you’re one of our biggest analytics experts, a data scientist, so to speak. How are you vetting the process?

Anya Korolyov

Not quite a data scientist, but I think it’s very important to kind of have the distinction of we have the tools and we know how to use them.

Again, to go back to regular eDiscovery that everybody is used to, “Oh, we ran search terms, we identified 100,000, we’re going to review, we have this deadline, great, we’re all happy, we move on”. Here, it’s way more trickier. Just because you ran certain search terms, and even the regular expressions, again, there’s definitely AI tools out there that are new and shiny and exciting to use for all of us. But search terms and regular expressions – to go back to defensibility – they’re true and tried. We know what we’re doing. We can calibrate the search terms a little bit and make sure that we’re not, for example, just picking any document with a nine-digit number when we have a company that runs financial spreadsheets over and over, and then we’re wasting time looking at them.

So, I think that is very important not only to have the tools at your disposal but to know how to use them and to know how to use them in a way that you can also later explain to the regulators. You don’t just walk up to the regulator and say, “Well, we used AI, we’re good”. It has to be a little bit more than that. It has to be a little bit more “This is what we found, these are the decisions that we made based on sampling a lot of documents”. People keep all kinds of things. They keep non-surgical PDFs from 1987 and why are they keeping them? Nobody knows, but they do. And so, we cannot search those documents, no search terms, nothing. We can OCR, they could be handwritten. Many, many different scenarios.

So, we have to make sure that we don’t leave anything on the table, that we sample things, and we use these tools appropriately to get us in the best possible situation where we can (a) defend it and (b) not spend all the money out there just to look at every single document that [inaudible].

Michael Sarlo

So, Sam, you guys at Norton Rose are the – I think the accuracy master. What are some of the metrics when we talk about sampling and search term calibration that create a defensible process so that you just didn’t run a report? How did you actually prove that those were accurate, and you’re doing a good job? This is really important because nobody does this really out in the marketplace. And there’s so much unnecessary spend looking at data points that we know. Doing a little bit of sampling, doing some calibration, you could just completely knock that out of the population. So, I’m always in favor of a more high-touch approach working with high-touch breach coaches, like you guys who really have a handle on the process.

Sam Sessler

Yes, well, you mentioned earlier, Mike, that making the process repeatable is key. And it’s really hard to find providers that document the workflow in a way, like Haystack does, to where you can tie back to the source information and how you found that baseball card that creates the entity.

And so, having a repeatable process is key, and sampling is an approach that we have used to assess our impact. And Anya’s team, they are a team of rock stars that understand the sampling approaches to data, so that you can not only make assessments about the richness of PII, or the richness of the sensitive data you’re trying to find in the population. And then how to find that richness, or how to find that information as effectively as possible, either through traditional methods or through using technology or AI.

But the key there is being able to track the evidence, the statistics that come out of that sampling process. And then, at the end of the day, when I say we expect to find that 10% of this population has PII. And not just that, but we have found customer information at 5% of this population, and employee information for the rest – these are just hypotheticals, but it’s some examples of the type of information we’re measuring through the sampling process. And so, it allows us to project what’s there, and then we can use our process, and technology to get there.

More often than that, we can project the cost before we even launch into data assessment just after a sampling process.

Michael Sarlo

It’s critical. And Anya, I know that you spend quite a bit of time structuring these reviews based on that. And we have a few minutes left, so we’ll probably breeze through, I think, some of the last few slides to make sure we touch on client success metrics.

But what are you doing here as far as the large data extractions? How are you utilizing that sampling to segment that population? What are some of the other just basic tools you use to narrow a data set using AI or analytics?

Anya Korolyov

Definitely, yes. The goal is to get through all the documents as quickly as possible. The faster we get there, the better off everybody is going to be. So, we have all these tools at our disposal. Even something as simple as a textual near duplicate. If something in a document is similar, it would probably be best to extra the fuller document first and then go to the next one, and that way you can just confirm everything, then extract it, and you can move on.

We do have a tool at Haystack that identifies the density of search terms and AI results. So, it always makes sense to give those documents to the review team first to go after those documents to start building your library as fast as possible, and that way you can just link the other documents as you come across these people.

So, it is an important part. It is a part that we work very closely with the breach coaches on to structure the review in a way that lets us finish even faster.

Michael Sarlo

And of course, having clean work product. As we get down to the deduplication of data subjects, that’s probably where we find that we’re spending more and more time cleaning that up. It’s so important as well to have a clear line of sight when we actually go to validate these addresses. And working with your clients again. You might be working with the corporate client to actually get a clear picture of where your employees are located. What’s going on here? Integrating data into different systems.

And where this I think ends up is quite a bit of tail work. Let’s say we get to the point where we’re kind of done in this work, Susana, Anya, what type of follow-up work is going on here? Where are you providing value beyond the breach?

Susana Medeiros

Yes, that’s right. So, even after you’ve notified regulators and, potentially, individuals of personal data or you’ve notified third parties of business sensitive information, you need really great reporting so that you can always be able to answer any requests for information that you might receive subsequently from regulators, from other third parties.

We did have an event – and you have to make sure that data is very good and can last a long time. We did have an event where it was two years after the event where we received a request for additional information from a regulator. And so, you need to make sure your final reporting can be revisited and is your gold standard.

But an event is also an opportunity to know more about your company’s data and its risks. You will learn throughout the course of a cyber incident what teams or departments are not anonymizing personal data, and where can there be greater improvements made from a compliance perspective. Where could we do better at encrypting data or disposing of data that no longer has any value to the organization? And so, we often assist clients with preparing heat maps of where all of the PII we located during the event is located.

And then, unfortunately, a lot of times as a result of an event, you do see a spike in data subject access requests as people want to know more about what information you have on them across the organization. We often assist with that. But no matter how good your entire process was, there may also be litigation resulting from the event, and so you want to make sure you have that defensible process so that the litigation doesn’t end up being about your investigation. You want that to be golden.

Michael Sarlo

That’s great. And I think that kind of ties us out. When we start to think about success metrics in these incidents, it could be all over the place. A successful outcome for an organization may be they had limited PII. Or a successful outcome may be you had a ton of PII and PHI, but we were able to do it and manage that risk for you at a reduced cost. Are your corporate clients or insurance clients, are they looking to get programs in place to help them prepare for this?

There are many organizations – very large, they are repeat offenders, especially as it relates to phishing attempts. I don’t think any organization is immune to their next breach, especially as they get larger.

Susana Medeiros

Yes, so true. And I know we’re at time, so I’ll just say briefly we are seeing more organizations engage in tabletop exercises and trying to live through in a day or two-day exercise what it would be like to respond to an event. And I think that’s very useful to, again, identify vulnerabilities.

But also, throughout this process, should an event, unfortunately, occur, you will want to make sure you have a defensible process in place, not only for regulators but also because your insurer is expecting for you to do what needs to be done as efficiently as possible. And so, we often meet with insurers throughout this entire process. I think a successful event is one where you, as the insured, gets paid. And so, a defensible process makes sure that that happens.

Michael Sarlo

So, critical. All right, guys, well, I don’t see if we have any questions, I’m not seeing any. If anybody wants to reach out to us, this will be available online, this presentation via email, and on the HaystackID website. I really can’t thank enough the rock star panel we had today.

I just want to give thanks to the entire team for the information and insight. We also wanted to thank everyone who took the time out of their schedule to attend today’s webcast. We know your time is valuable. Appreciate you sharing it with us today.

Again, today’s webcast will be available on-demand on our website beginning tomorrow, and a full transcript of the webcast will be available early next week.

We also hope that you will have an opportunity to attend our September webcast currently scheduled for September 28^th. This upcoming webcast will feature an expert presentation discussion that is led by the amazing Chris Wall – the state of US privacy law, what to expect in 2023 and beyond – who is our leader of our Global Advisory Practice in Privacy and Compliance.

You can learn more about us and register for upcoming HaystackID webcasts and review our extensive library of on-demand webcasts at haystackid.com.

Thank you, again, for attending. Have a great day.

About HaystackID®

HaystackID is a specialized eDiscovery services firm that helps corporations and law firms securely find, understand, and learn from data when facing complex, data-intensive investigations and litigation. HaystackID mobilizes industry-leading cyber discovery services, enterprise solutions, and legal discovery offerings to serve more than 500 of the world’s leading corporations and law firms in North America and Europe. Serving nearly half of the Fortune 100, HaystackID is an alternative cyber and legal services provider that combines expertise and technical excellence with a culture of white-glove customer service.

HaystackID is a trusted partner to corporations, law firms, and governmental agencies, supporting targeted audits, complex investigations, and civil litigation matters with a dedicated global team of cybersecurity, information governance, and legal discovery experts. Applying the potential of artificial intelligence, the precision of data science, the power of machine learning, and the practicality of expertly trained and managed reviewers, HaystackID provides clients with secure, defensible, and flexible capabilities to address the most complex and time-sensitive eDiscovery-centric challenges.

In addition to consistently being ranked by Chambers USA, the company was recently named a worldwide leader in eDiscovery Services by IDC MarketScape, a representative vendor in the Gartner Market Guide for E-Discovery Solutions, and a Legal Technology Trailblazer by The National Law Journal. Further, HaystackID has achieved SOC 2 Type II attestation in the five trust service areas of security, availability, processing integrity, confidentiality, and privacy. For more information about its suite of services, including programs and solutions for unique legal enterprise needs, go to HaystackID.com.

[Webcast Transcript] Data Mining in Incident Response: Managing Risk and Spend through an Effective Evidence-Based Approach

[Webcast] Data Mining in Incident Response: Managing Risk and Spend through an Effective Evidence-Based Approach

Worldwide Reach. Local Expert Touch.

North America | Latin America | Western Europe | Middle East | Asia-Pacific