[Webcast Transcript] Today’s Privacy Reality: AI, Assessments, Breach and DSARs
Editor’s Note: On February 15, 2023, HaystackID shared an educational webcast to provide valuable insight into the ways in which AI is being used to address key issues in the realm of privacy and cybersecurity. The expert panel discussed the benefits and challenges of using AI in this context and offered practical advice for organizations looking to incorporate AI into their privacy and cybersecurity strategies.
While the entire recorded presentation is available for on-demand viewing, a complete transcript of the presentation is provided for your convenience.
[Webcast Transcript] Today’s Privacy Reality: AI, Assessments, Breach and DSARs
+ Christopher Wall
DPO and Special Counsel for Global Privacy and Forensics, HaystackID
+ Kamran Salour
Partner, Troutman Pepper
+ David Wallack
Lead Privacy and Security Counsel, Motive
+ Michael Sarlo
Chief Innovation Officer and President of Global Investigations and Cyber Incident Response Services, HaystackID
Hello, and welcome to today’s webinar. We have a great presentation for you today, but before we start, there are just a few general admin points to cover.
First and foremost, please use the online question tool to post any questions you have, and we will share them with our speakers. Second, if you experience any technical difficulties today, please let us know in that same question tool and we will do our best to resolve them. And finally, just to note, the recording of this session will be shared via email in the coming days.
So, without further ado, I’d like to hand it over to our speakers.
Good afternoon and welcome from HaystackID. And I hope everyone is having a great week so far. In the next hour with us, we’ll make that week even better.
My name is Chris Wall and on behalf of the entire team here at HaystackID, we thank you for attending today’s presentation, and a discussion for the next hour titled Today’s Privacy Reality: AI, Assessments, Breach, and DSARs. And we’ve clearly tried to pack all of those key buzzwords into one title. If you look at it, we have privacy, we have AI, we have breach, and then the latecomer here, which is still a big draw, DSARs. The only thing we left out was maybe puppies. So, maybe next time we’ll talk about puppies too. We’ve also got assessments built in here and we all know that’s a pretty sexy topic, probably almost as exciting as an audit.
But all kidding aside, we do have a fantastic program lined up. And our discussion today is teed up to cover a lot of really important and timely topics. Today’s webcast, I’ll note, is part of HaystackID’s regular series of educational presentations developed to ensure that our listeners are proactively prepared to see realized all of their cybersecurity, information governance, and eDiscovery objectives. This webcast is being recorded for future on-demand consumption, and we expect that the recording and a complete presentation transcript will be available on the HaystackID website soon after we complete today’s live presentation.
So, our presenters for today’s webcast include experts with a very keen understanding of emerging regulations and practices in the area of data protection and cybersecurity, which particular insight into the intersection of privacy, AI, and cybersecurity.
So, joining us today, we have three very distinguished panelists, and I’ll let each one of them introduce themselves now. Mike, how about if you lead us off?
Sure. Thanks so much for that introduction, Chris. Hi guys, my name is Mike Sarlo. I’m HaystackID’s Chief Innovation Officer and President of Global Investigation and Cyber Incident Response Services. I’ve been in the space for almost 12 years now, have a background really in information technology. Started out as a digital forensics examiner really from an operational background, led eDiscovery teams, and now lead – still heavily involved in our day-to-day digital forensics operations, but also our cyber incident response offering both for incidents and then on the general litigation side as it relates to more investigative matters.
Happy to be here. Thanks, Chris.
Thank you. Happy to be here as well. My name is Kamran Salour. I’m a Data Privacy and Cybersecurity Attorney at Troutman Pepper. I lead our incident response group. And really, I help organizations respond to all types of data security incidents, whether it’s a lost laptop or a ransomware attack. And I really try to have a business focus on those responses. I really try to do what’s best for the business and really try to focus on minimizing the risk of the business, both pre-breach, as well as during the incident response process.
Thanks, Kamran. David.
Thanks, Chris. Happy to be here. Thanks for inviting me. David Wallack, I’m Lead Privacy and Security Counsel at Motive Technologies. We are a hardware manufacturer and SaaS provider in the logistics industry. I’m part of our in-house legal team. So, really I work with people like Kamran on a day-to-day basis on all of our policies, procedures, infrastructure for breach response, how we handle data, how we keep it secure, and how we obtain coverages for all of our data security activities. And then what would really happen in the event of having a data breach and having to mitigate it?
As an in-house lawyer, the only person I think on the panel here that’s an in-house lawyer, I do have to say that all of my opinions today are mine and don’t represent that of the company. But I will try to give it as plain as I can.
Thanks, David. Thank you. And finally, I’ll introduce myself. My name is Chris Wall and I’m DPO and in-house counsel, chair of our privacy and advisory practice here at HaystackID. And HaystackID, of course, is an eDiscovery, privacy, data security, and forensic investigations firm.
So, my job at HaystackID is to guide our clients through the privacy and data protection thicket as cyber investigations, and information governance, and traditional discovery leads our clients down paths or jurisdictions less trod.
So, with that, we plan to cover today, five primary areas. And first, we’re going to talk about privacy and where that interrelates with AI. And then we’ll talk a little bit about how we assess risk and how we address data subject access requests, or what we’re going to refer to here – at least for the rest of today’s webinar – simply as DSARs. And then we’ll turn to cyber breach where we’ll talk about using AI to prevent data loss. And then we’ll wrap up with some discussion about how using AI – how we can use AI with breach response.
And I should mention too here that our discussion today really should be a discussion. You have – each one of you attending here this afternoon – you have an opportunity to ask questions. And all of the panelists and I will be watching as those questions come in, and we’ll try to address them as we have this little conversation over the next hour. So, I encourage you to ask questions.
And to that point, we do have a formal polling question. We just talked about what our agenda is going to be for today, but we recognize that each one of you came to this session with your own unique agenda. And we appreciate each one of you making time to do that. And so, as you engage with us and as we engage with you in the discussion today, I encourage you to ask those questions.
The first one here that we’re going to ask of you, though, so that we can understand how to tailor our remarks maybe a little bit is “What are the privacy issues that you personally face within your organization?”
This feedback gives us a little direction on where to focus during the next hour. Because I know that we have law firm, consulting, and in-house perspectives on this panel, and we suspect that we have a similar composition of attendees in the audience today.
All right, so clearly, the responses are in. Mostly “Changing use of personal data”, which would tend to – closely followed by “Data minimization and cross-border transfers”. But clearly, that assessment piece we’ll have to talk about significantly here with that changing use of personal data within the workplace.
So, with that foundation, let’s get rolling with AI and privacy broadly just to lay a little bit of a foundation. In terms of privacy laws and regulations that we’ll be talking about here, there’s obviously an increasing number of regulations around the world that require companies to disclose what data they have and how they use it.
In a latest count, according to the UN anyway, that’s about 137 countries worldwide. Privacy regulation is not going anywhere, going away anytime soon. But the one thing that pretty much brought privacy to the fore and into the public discourse, of course, was the EU’s GDPR. And so, a lot of what we’re going to be talking about here will sound GDPR-centric, but it’s really not. Because the GDPR, while it’s the most obvious of these data protection laws, I think it’s safe to say that we have the Europeans to thank for this modern privacy push. Europe, of course, has a fraught history with the use of personal information, whether it was race, religion, ethnicity, or national origin, along with other things like personal and real property ownership.
There are few places, if any, in the world that can claim a more problematic historical use of large aggregations of data than can the Europeans, because that unfettered collection and use of that data led to seizures and deportations and concentration camps. So, understandably, post-World War II there was a general European mistrust for large aggregations of data by any Government or private entity. And that led us eventually to the 1995 Privacy Directive. There are many steps between that from 1950 up through 1995 that got us to the Privacy Directive. But then in May of 2018, we got the GDPR.
And the reason why we mention that is because that GDPR really did start a wave around the world. With the international spread of data protection efforts, we can see similar regulations in the US, for instance, led by the CCPA and now the CPRA as of January. And here, I’m a Virginia resident, we have the Virginia privacy laws here. And later this year, we’ll see similar privacy laws take effect in Colorado, Connecticut, and in Utah in the United States.
And in the US, of course, we take a sectoral industry-specific approach to traditional large stores of information that we have in US society, in banking, education, and of course, in healthcare. And those large aggregations of data in the US give us Gramm-Leach-Bliley, PCI/DSS, FERPA, and of course, HIPAA.
So, when we talk about privacy, we’re going to be referencing some of these laws. Again, we’re just talking about European and US privacy laws. We will also reference other privacy laws around the world, because we are certainly not saying these are the only privacy or data protection laws in town, or that we have to be mindful of.
So, we talked about privacy. We wanted to mention a little bit about AI and what we’re talking about when we’re speaking about AI itself.
And I think it’s important to level set what we’re talking about when we discuss AI today. Because we’re not talking about the Matrix, we’re not talking about the Terminator or even Free Guy. I think we need to be careful about how we use the term “AI” since it connotes that there’s actual intelligence or thinking going on when, in fact, there isn’t really. there’s no real supreme intelligence making decisions for us, and most of what we’re talking about when we talk about AI is not strong AI, that’s self-aware AI.
Most of what we’re talking about here in terms of AI is either rule-based AI, which involves rules and logic. If you think about Quicken, or TurboTax, or if/then applications. Or we’re talking about machine learning, which involves pattern-based machine learning. Think Siri or Alexa or Cortana or Google, or ChatGPT. Spam filters or even today’s versions of self-driving cars to some context.
And in the legal technology space, this is what we’re talking about when we talk about AI-driven contract term identification in CLMs or TAR 2.0 or TAR 3.0 even. And the main difference between rule-based and machine learning AI is that rule-based systems are deterministic and they don’t scale well. And yet, AI that’s rule-based is often going to work better with smaller sets of data where machine learning simply doesn’t have enough to train on.
So, with that preface, there are certainly benefits that AI can provide. And I want to turn to our panelists and talk about what some of those benefits can involve. And let’s start there with – well, we’ll start with David.
David, you play in this space on a day-to-day basis. So, maybe we can start there by talking about the practical benefits of AI in our lives.
Sure. Well, the benefits of AI, I think, are enormously broad across industries. And most AI, whether we want to define it as actual AI or simply just sort of rule-based decision-making, is created because there’s a market need for it.
In our case, we provide AI-based technologies that help keep the roads safer. That’s what we want to do. We want to make sure that truckers are safer, that deliveries are getting to where they’re supposed to be when they’re supposed to be there. That people are actually driving the way that they’re supposed to be driving and aren’t distracted. So, we create all of these technologies that help fleets get from Point A to Point B in a safer, more efficient way.
So, there are, I think, a lot of benevolent uses of AI, but increasingly, there are certainly privacy concerns that stem from the use of AI and the misuse of information.
And going back to the slide previous to this, I think one of the areas that probably you could expand upon or we could expand upon as far as the discussion goes is this really is – in the US, we come at it from a different standpoint. We don’t look at privacy in the US as a fundamental right the way that the EU does. We look at it as a right that is embedded in the contract. We tried to find it in the Constitution, we were unable to. But now, we really look at it more as almost a commercial right.
And so, you are starting to see, really, the FTC kind of wedge their way into this space as the privacy watchdog. And really, what they’re saying with the uses of AI is what they don’t want to see is they don’t want to see dark patterns in the use of AI. Or they don’t want to see deceptive trade practices, which really has been the rubric of the FTC for as long as it’s been formed. They monitor the marketplace for deceptive trade practices to protect consumers.
And so, now, they’re using privacy as the wedge in which they’re going to enter this market, and they’re going to regulate using deceptive trade practices and dark patterns in the uses of AI.
So, I think that’s where you start to see companies, whether or not you are actually regulated directly by any of these laws. In the US, it’s becoming a little bit inescapable that we all have to play by the same set of rules.
And I think that’s a great point. Kamran, do you want to add anything here?
I think to your point, it is going to be something that is going to be under increased scrutiny (AI), just in terms of how it’s used, where it’s used, why it’s used. But I think the negatives that we talk about with AI or that you often hear about AI really shouldn’t be the focus. The focus should be on all of the great things that AI can do, not only along the lines of keeping roads safer, but from just a privacy and data security standpoint. Of all the things that AI can do to help secure your data, help you organize your data, help you know what data you have, where it is, all of those things are just so much easier to do with the use of AI.
And so, I often think AI gets a bad rap in the media. And so, I’m hopeful that that is a narrative that will soon go away.
Well, Kamran, that’s a perfect segue to our second polling question here before we get into a discussion about where we see AI going.
But this is our second polling question for our audience. “In what light do you see the use of AI in today’s working environments?”
As a net positive, maybe it presents great opportunity in the workplace to streamline workflows and potentially increase accuracy and efficiency and that kind of thing. Net neutral, it’s not going to have any effect one way or another on the workplace. Net negative, it will have a deleterious effect on the workplace, make things worse for workers in the workplace overall. Or “D”, apocalyptic. Hello HAL, Skynet, Ultron, V’Ger, Westworld, what’s it going to be.
The votes are coming in. I don’t see any “Ds”. All right, the votes are in. 69% “Net positive”, and I think that’s probably reflective of certainly of our panelists here. I think all of us are taking a bullish approach to the advantages of AI.
To Kamran’s point, I think the press really does give a bad rap to AI a lot of times. But I don’t think it’s just the press. We’re going to talk about some of those laws here in just a second.
There are a lot of benefits, obviously, but we do need to proceed with caution. So, if we move on beyond the broad discussion of AI’s interrelation with privacy and its associated pros and cons, let’s talk about the risks because I think that’s really where that dark side comes in.
And we’re going to start here under the GDPR. Again, I mentioned we’ll be GDPR-centric here, whether that’s the UK or the EU GDPRs. They call out the use of automated decision-making and profiling that automated processing can result in. And there’s an obvious unease in the EU and in the UK about AI, which is often what that automated decision-making or automated processing is considered to mean.
But I think that’s a risk in and of itself. It gives the casual reader – of Article 22 here – the sense that AI is inherently bad. But clearly, based on that last poll, none of us at this webinar is the casual observer of all things AI. Again, I think we’re all pretty bullish about where AI is going and the advantages it can have on society. And that’s why we’re here.
Article 22 specifically addresses what we’ll call AI here, and it’s important to note that this type of AI application does not involve human intervention. That’s what Article 22, I think, is trying to call out as the risk. And the area specifically called out under Article 22 as having significant effect include decisions involving finance, health, someone’s reputation, employment, someone’s behavior or lifestyle choices.
So, for example, if someone applies online for a loan using an online website and that website then uses algorithms and automated credit searching to provide an immediate yes or no decision on that loan application, there’s no human intervention and that is automated decision-making. If the applicant’s info were simply aggregated and presented to a human loan officer for review with the suggested score, and then the loan officer has the freedom to make a decision using the AI score as one of several metrics, then Article 22 would likely not apply.
That’s for a significant decision. Significance is important. That’s what GDPR Article 22 is designed to address, those significant decisions. I don’t think Netflix’s movie recommendations for me and my wife this weekend based on past viewing habits is likely to be considered significant. It just means that instead of watching something with explosions, we’ll end up watching a movie where love is a life-or-death proposition.
So, again, I think it’s important that we focus on the fact that Article 22 of the GDPR is not a prohibition on AI. If it’s used correctly, automated decision-making can streamline things and can help make decisions fairly and consistently.
David, anything you want to add here, or Kamran or Mike, anything you want to add here from a US standpoint or from elsewhere around the world specifically with respect to this approach to automated decision-making?
I think one thing to point out with this Article 22, I think it’s a very easily avoidable prohibition because some of the decisions are solely based on automated processing. So, I think it’s still important for organizations knowing that you can have an automated process and then have somebody at the end, with human intervention, to basically circumvent the application of Article 22. I still think it’s important for organizations to take what Article 22 is intended to do, which is I think to avoid machine bias and unfair treatment or discrimination, and still keep that in mind when they’re practicing their privacy policies, or just making decisions. Because I think even outside the scope of Article 22, an organization could run afoul of certain laws in the US when it comes to decision-making that’s solely automated.
Yes, I think that Kamran is right on point with that. I think that the concepts that are articulated in Article 22 have probably a broader scope than just under the GDPR. It will probably affect companies in more ways than which they realize.
Kamran’s right to point out that you could have technologies that maybe don’t actually make a decision but suggest a decision. They could still be discriminatory. And really, the purpose of this is for companies to start looking at the ways in which they’re using AI, making sure that they’re eliminating bias. In addition to which all of this is really going to start to fall under the ESG umbrella. So, your due diligence, whether it be with investors or with large clients that are giving you their TPRE or [inaudible], you’re going to have to be able to answer all of these questions about how your AI works, how it profiles people, whether or not you’ve done audits for the elimination of bias. It’s all going to fall under either revenue enablement or being able to obtain funds in the marketplace from investors.
So, I think that the scope of this really sort of hangs over the marketplace in a much broader way than just Article 22.
And I think you’re right, both of you, and I appreciate that input. Because Article 22 is just the starting point, actually. If we look at where it’s going even today – and again, this reflects the broader concern over AI and what AI means. If we look at the proposed EU AI Act, again, it’s a proposed European law on AI and assigns risk categories to the application of AI.
And to Kamran and David’s points, Article 22 is just where it’s starting. I think we need to look at where this is going to progress.
So, this AI Act, again, it’s proposed but it’s already been adopted in other places around the world. Brazil adopted it last year. But what this proposed act does in the EU is it assigns risk categories to each application of AI. And applications and systems that create an unacceptable risk, such as a government-run social scoring of the type that’s used in China, for instance, are banned outright. It would also ban subliminal or manipulative systems, and that’s still a little unclear what “manipulative systems mean exactly”. But along with other uses of AI that would exploit vulnerabilities of certain groups in some kind of negative way, those are banned outright.
The second category would be high-risk applications, such as resume scanning to rank job applicants, or gathering financial info to provide a credit score, for instance. Those are not banned, those are just subject to additional legal requirements.
And then third, essentially, anything that’s not explicitly banned or listed as high risk is pretty much left unregulated under this proposed act.
And of course, once this act takes effect, it will involve creation of a new regulatory body. Right now, it’s being called the European Artificial Intelligence Board. But again, it’s taken effect elsewhere around the world, already being looked at as the model for handling AI going forward, kind of the Brussels Effect, as they call it.
Anyone else want to add something there before we move on?
I would say that this is good for AI. I know for organizations, certainly, there’s a burden, but the net positive effect from really what will become a global regulatory mechanism around the use of AI so that companies operating on a minimum in Europe are compliant is going to result in much more transparent and smarter AI.
So, I think folks should view this, generally, as a positive thing when they think about AI. And certainly, I think, the US is going to probably want to tackle this much faster than the chugging along we have from more Federal privacy mandates. We have a TikTok effect. We have just other global forces that are heavily in focus because of AI and the negative wartime consequences and uses of AI.
So, I think this is going to get regulated here as well fairly rapidly.
And I think we’ll see that as AI gains greater acceptance, and as it becomes more normalized in day-to-day society and in day-to-day use.
So, despite the AI angst that’s reflected in some of this AI regulation, it presents a lot of opportunity for us obviously, as individuals, and as companies. And I think it can help companies – we’ve talked about already – to comply with protection regulations.
And so, David, I’m going to turn to you first as looking at it as in-house counsel anyway. Your organization relies on, or at least intends to rely significantly on, AI going forward for a lot of what your business does. Where do you see those advantages there? And then also from a legal operations standpoint.
Well, from the assessing privacy risk standpoint, I think that it’s pretty clear that if – we’re talking about all the great uses of AI, but if AI is going to be viewed as problematic, the problem is also the solution in this case.
It’s really very difficult for any company that is, essentially, in the data business, which is increasingly more and more of the world, to get a handle on their data without using technologies like the ones that we’re discussing today. There’s just so much of it. It needs to be classified and categorized. It needs to be secured.
In the event of a security incident – well, first of all, even just going back to the security incident, you need to be monitoring the data for security event monitoring to figure out if there’s been any malicious pattern or bulk exfiltration of data so that you even know that there’s been a breach.
Yes, I think that in order for us to get our hands around how to keep it secure, where it lives, who it’s going to, who we’re sharing it with. In the event of DSARs – which I know we’re not to yet – but how we’re going to actually comply with the access rights of individuals?
It’s such a tangled, jumbled mess in most companies that without these technologies, the thought of trying to do it manually is really completely overwhelming. I don’t know that many companies that are sophisticated with their data collection practices are going to be able to handle their responsibilities and responses without the use of these technologies.
What about the use of AI itself in conducting assessments, right? So, you might be required or just as a best practice, or a leading practice you, as an organization, conduct assessments of your use of this data, do you use AI to assist in conducting those assessments?
Yes, we do. We use AI in all of our [privacy and impact] assessments. And we use AI to respond to risk evaluations from our larger customers. We use AI in a multitude of enterprise-level ways to help the business make money.
So, from a basic revenue enablement standpoint, not even just the products that we’re building and selling, but how it is that we address our security and privacy posture publicly, AI is essential for us.
Kamran, as outside counsel, as you advise your clients on their data risk and assessing what data risk they have, how are you advising them in terms of their use of AI?
In a couple of ways, Chris. I think, certainly, AI is very helpful in what I like to loosely call the “data mapping”, which is, in essence, knowing what data you have and where that data is stored. Because that really comes into play if an organization is experiencing a data security incident and data has been exfiltrated by the threat actor. I can’t tell you how many times I have dealt with clients that if we say that Server X has signs of exfiltration, and we ask the client “Well, what’s on Server X”, they have no idea.
And so, that creates a lot of problems in terms of how you are going to respond to an incident. It’s a much easier message to send to clients and customers and employees if you can unequivocally state we have data exfiltration only in Server X and Server X has just innocuous public domain images. Something completely innocuous that is not going to have any privacy or security type of impact.
And so, I think the process of really knowing what data you have and where, and maintaining that, I think, is much more facilitated with the use of AI than doing so manually.
So, I think AI plays a crucial role in that compartmentalizing or categorizing where your data is.
Now, even on a security side, I always suggest using AI-related tools, because we just mentioned earlier, one of the issues, when we talk about data security, is detection and knowing when the threat actor has gotten into your environment. And sometimes you don’t detect that until there’s a ransom note. That’s obviously not the first time the threat actor was in your environment when you see a ransom note and so using AI to detect whether it’s atypical logins or an atypical amount of data leaving the environment, a large number of search queries. These types of things that, traditionally, antivirus and whatnot may not pick up on, if you’re able to use AI and identify anomalous behavior, if that helps identify threats, that just puts you in a much better position to reduce the impact of a security incident going forward.
Yes, we’re going to talk about the use of AI and preventing data loss here in a second. But look, I don’t think we can overstate the power that AI can have in other areas of risk here.
From an information governance standpoint, every data protection and data privacy law out there has, as one of its foundational principles, the idea of data minimization. And that’s for both – obviously for just straight-up privacy reasons, but then also from a security standpoint. You don’t have the data around, if you only have a minimum amount of personal information within your organization, obviously the risk of breach, the risk of data loss, or data misuse goes down proportionally.
Mike, do you want to talk a little bit about how we can use that AI in an information governance way to help reduce or help to minimize the amount of data that we have subject to – potentially subject to loss or misuse?
Yes. I would start by saying that I think although there’s a great array of tools that have come into the market to assist organizations with what I will call even “privacy-based information governance”, I don’t find that organizations tend to have a great handle on it or just aren’t doing it and they want to start doing it.
And to be honest with you, it’s really about the basics of how to first – at first blush, which I think organizations still aren’t there yet. We don’t see even appropriate usage of deduplication of files across any form of an enterprise. That’s a really great place to start to understand how much data you can easily just throw away from a duplicative standpoint.
When we start thinking more about structured analytics and just simple rules-based workflow, being able to analyze metadata involving dates and actually being able to create author maps and identify key stakeholders in the organization, be it any member of the executive team, the legal team, lower level employees and the type of work product that they generate. A lot of that exercise can be conducted through more of a broad strokes metadata analysis.
Now, it’s only at that point where, really, we get to what is our target data population. Going through these – I would call them more like roles-based steps – usually, we’ll remove somewhere between 30 and 50% of the data. That’s when we start to really think about how we can use AI to help organizations. Usually, it’s going to be a combination of some type of tooling that is able to generate visualizations that allow teams to act on data. Usually, we’re trying to cluster data at the same time, so that those visualizations are just more actionable. And the big piece is how do you validate, how do you create a defensible process when using AI. And I think having an aspect of repeatability with the use of any AI system is so important for any organization that’s embarking down any software journey, or service journey that involves AI. Having a repeatable process where if you give it certain inputs, you should always get the same results at the state of the model at that time.
Obviously, models tend to improve with use for certain types of AI. But really important – and we spend a lot of time working with organizations as well to create documented workstreams that allow them to basically shift the decision-making authorities, in some ways, to the AI through a well-documented and defensible and repeatable process.
Thanks, Mike. A big part of that is classification of your data, and let’s use that idea of using AI to classify your data through visualizations or whatever other process you’re using. And one of the ways you can do that is with data subject access requests – sorry, not through data is for data subject access requests or DSARs.
And I think maybe just for the benefit of some of our attendees here who maybe are unfamiliar with what a DSAR is, and our apologies to those of you who are, unfortunately, too familiar with what a DSAR is, we will just lay a little foundation here.
The GDPR, both in the EU and the UK, and the CCPA, Virginia CPA, and the three other laws coming into effect here in the US over the next year have all been hailed as giving individuals far greater control over their data, because we’re a much more data-aware society. And DSARs provide individuals with the ability to request all the information an organization holds on them for access, for reportability, for deletion, in some instances for correction.
And under the GDPR, timing to respond to one of these requests that you could get from an individual is critical. It’s 30 days or otherwise, you face the GDPR higher fine of €20 million or up to 4%. And the various US state laws provide specific acknowledgment and response requirements as well. Obviously, they don’t have that GDPR fine, but they do have specific acknowledgment requirements.
So, those of you who may not have encountered DSARs, we should just say give it time. The number of DSARs is expected to increase both as the number of jurisdictions that grant this right increases, and as the number of individuals who seek to exercise their rights begin to do so. If for no other reason than they can.
In early 2022, 71% of surveyed companies had experienced an increase in the number of employees DSARs since the GDPR took effect in May of 2018. Employee access requests further spiked, not surprisingly, during the COVID lockdown. People had nothing else to do, so they figured, hey, let’s submit a DSAR to any company that they think might have data on me.
Anything you want to add here, Kamran, David, or Mike about what DSARs are just kind of as a foundational element?
I would just add that if you’re not necessarily subject to them or you’re not seeing a lot of them, it doesn’t mean that your supply chain or your customers are not subject to them. And if you’re linked in that chain and you can’t help your partners find the data that they need to respond to these, then that can be a significant strain on business relationships.
So, again, there’s just a market reality that is looming over all companies to get their house in order and to be able to comply with these because even if they are not directly imposed on you, somewhere in the supply chain, they may have a direct effect on you.
That is a great, great point. Thanks, David. But before we even get to putting together a DSAR response, I think AI can help us with the eventuality of receiving a DSAR. And this is to Kamran and David’s point earlier, and Mike, you touched on this also, knowing what data you have, knowing where that data is, that’s classification, that’s data mapping, and knowing what data you have and where it is for a specific purpose.
Kamran, do you want to talk about this in a DSAR response context, how you would use AI specifically? You mentioned earlier that for years the holy grail has been able to classify and categorize documents and records at the time of inception, at the time you create a document. From the time that a user clicks Save or hits Send, and there have been various software and archiving platforms over the years that have sought to do just that.
So, where do we sit today?
I think based on my experience with my clients, I think we still have a long way to go with respect to using AI for this process, because as we’ve all discussed thus far, it just seems like the data mapping or the data classification is a step or process that is either ignored or it is not centralized. So, you may have one division of the company having a certain way of storing and classifying documents, and another section of the company having their own method. And so, there’s a lack of communication, I think, across the board.
So, I think it’s a combination of communicating as well as knowing that there are tools out there to make the process much more streamlined and much less of a burden.
And I know sometimes these DSAR responses aren’t necessarily – it’s a relatively newer phenomenon in the US, and so not all states have these options and rights for consumers, but I think it is something that is going to spread across all the states sooner rather than later.
And so, I think it is something that organizations need to really start thinking about. Because if you are ill-equipped to handle simple data requests, it’s just going to create not only such a burden on you day-to-day business-wise, but then it could put you at risk of regulatory scrutiny and you can run afoul of certain provisions. It’s a silly reason to be under scrutiny because you couldn’t fulfill a basic request when there’s plenty of tools out there that can help you do so.
Amen to that. David or Mike, anything you want to add here?
I do think it’s really important for organizations to have a defined process, especially when they don’t have a good handle on their data around the most critical elements that might be required as far as their customer data to respond to a DSAR.
I see some organizations take a much too granular approach which, in some ways, can be incredibly burdensome if you start to scale and see more than your fair share of what you are expecting. And I see some organizations take too broad of an approach.
So, it’s really important to work with folks like Kamran to find that right balance. And then, again, it’s all about repeatability and being able to have a repeatable process here.
And defensible. Thanks, Mike. Let’s shift gears a little bit, and let’s talk about AI and breach. Talk about data loss more broadly than just breach.
And we’ve got our third polling question here. And the question here is “Do you have automated processes in place within your organization to identify breaches when they occur?”
It’s a yes/no, or if you do, are you that process?
This is looking good. All right, answers are in. The vast majority of our attendees today have automated processes in place to identify breaches when they occur. So, we’re going to talk about some of those approaches and we’re going to talk about here how AI can play into those approaches (that automation).
So, let’s start with some metrics. About 90% of all data breaches reported between 2017 and early 2019 were the result of human error and not cyber criminals. How can AI help there?
And 51% of data breaches were caused by cyberattacks. 25 came from negligent employees – no surprise there – and 24% were caused by glitches. That’s a technical term. That would include IT and internal process failures.
So, the big question – and I’ll put it to Mike first – can AI do it better given the fact that so many of these data loss incidents were based on human error?
AI is great for detection, it’s great for response, but still employee training is so critical, and a lot of organizations – there’s various vendors like KnowBe4, even Google has a free cybersecurity training that organizations can bring in. It’s critical. And that’s where that human error component really ramps up. Folks still get spoofed, they get phished. Some of the phishing campaigns are so realistic and so sophisticated.
I’ve been getting invited to speak on panels by some type of threat actor for the past five years and they look real. They’ll go very far to create really complex scams that have multiple actors. We’ve really seen such an uptick in the social engineering as it relates to data breach. This is somebody who might take over an email account. They email somebody in the finance team, somebody then requires some money, or more of a spoofing of customers.
So, it is helpful in that domain, but it’s not the be-all-end-all, unfortunately. You still need training. You still need well-documented data protection, a cybersecurity strategy, and you always must be prepared for the eventuality that there will be a breach. Everybody will be get one [off microphone].
We do also beyond just that [off microphone] thinking about AI [off microphone] detection and response, this is where we’re seeing the use of AI in breach. Certainly, the behavioral analytics is critical.
I work with some customers that are recording eight terabytes of data in their SIEM, which is a single event management log across their enterprise. That is in raw text. There is no way that a human could ever go through that data on their own. That is a combination of rules-based workflows that exist to elevate certain types of events, but also AI goes a long way to identify things that are outliers in the day-to-day activity of a user, or with the way that they interact with resources on their network. Or vice versa, when a resource starts to actually misbehave and do something it shouldn’t. [Off microphone] great job [off microphone] really great tooling out in the marketplace.
AI, again – but let me just go back for a minute, having a good understanding of the data map and where your most important data points are. How your network actually operates and who has access to what is critical to enabling a successful AI [off microphone]. If you don’t have that, you don’t get the event management to actually elevate the proper events and to score them, and rank them from a risk tier standpoint.
Now, some of the basics too is that data loss prevention. At Haystack, we have our own suite of AI tooling that will detect – more of our customers in the eDiscovery or the post-data breach response paradigm. That will detect various types of PII and PHI, and that’s really great for a post-breach event. We also use that same technology for our customers who are looking to go through a privacy-based information governance exercise.
Or even a DSAR, right, Mike?
Or even a DSAR and, again, that can be a great mechanism to start down that categorization process. If they need to respond to a DSAR from an ad hoc standpoint, great, but if they want to start to categorize their data from a privacy standpoint and, oftentimes, more from a conceptual standpoint if they’re doing a complete information governance exercise, AI is the big enabler here for sure.
And it’s much better than running regular expressions or search terms. And a lot of organizations don’t necessarily [off microphone] a lot of this tooling is native in their Office 365, in their G Suite where they can at least get some immediate protection with a small amount of consulting as it relates to some of the more typical data loss [off microphone] to an email, a massive customer list that has all of their account numbers, and their social security numbers.
Using our AI in particular, because the data landscape is just so large and so robust, we try to really focus on prioritization, and we try to identify the most dense folders, the most dense servers, and in general, the most dense documents. And when I say dense, we use a density score that describes how much PII or PHI is in a data set or an area. And this is great, at that point, to understand “Oh, hey, do I have some odd data-saving habits in my organization. We didn’t know about this. Or hey, let’s go and remediate this”. And I think that remediation is key as well where we can use AI in that construct as well.
Well, let’s talk about the remediation. So, let’s say you’ve used AI as effectively as you can, and you’ve taken all the other steps that you should take to prevent data loss. But they say that there are only two types of companies in the world, those that have experienced a breach and those that will.
So, let’s talk about what you do and how you can use AI – this is our last discussion point here – in the event of a breach. And you touched on this already Mike, and Kamran, I’m going to ask both of you to talk about here how can we address that.
Mike’s touched on some of it already when you’re identifying PII that’s been part of the data loss incident. But maybe you can to us about, broadly speaking, how AI can help us in that breach situation. Kamran, let’s go with you first.
Absolutely. I think when we’re talking about a breach response, time is critical for a number of reasons. One of the reasons why timing is critical – and when I say timing, because you’re going to have… depending on the type of incident and the type of organization you are, and your customers, you may have to make some very quick decisions about certain things with respect to – let’s say if it’s a ransomware attack, whether you’re going to negotiate, pay a ransom. Even outside of a ransomware context, you may have to make some very quick decisions on messaging. Are you going to tell your employees? Are you going to tell your customers? If so, what are you going to tell them?
And without the use of AI to help on both the data classification standpoint, as well as the data – or security detection standpoint, it puts you in a much harder… it makes those decisions much harder to make.
And so, for instance, if an organization does not have AI to help identify anomalous activity that would indicate exfiltration or removal of data from your environment. And assume also the organization doesn’t have AI in place and they don’t have a good data classification system. As an organization, if you have to message third parties, you’re not able to message them that you were able to detect a threat before it happened, before it became a security incident. You’re not able to message them right away that the data that was exfiltrated is innocuous types of data.
And so, what that means is it puts a much larger strain on the organization if you’re messaging, “Hey, we had a security incident, our investigation is ongoing”. Often, those customers are going to come back and they’re going to say, “Well, what about my information?” And if you don’t know the answers, sometimes those customers will say, “Well, we are going to disconnect and we’re not going to communicate with you until we know more about our information and what steps you’re taking to increase your security”. So, it puts a huge toll on the organization. Because a lot of that time, you are spending – instead of focusing on containment, restoration, remediation, you’re focused on communicating with your customers, and they’re largely going to be unhappy customers. And then if you have customers that disconnect from your environment, now you have increased business loss. And then the time it takes for you to regain that trust and confidence from the customer to re-engage with your network environment, all that translates into real hard dollars lost for the organization.
And so, I think if you have that AI in place, you’re going to speed up knowing those answers. And the sooner you’re able to know what data was impacted and whose data it was, it just makes the process much easier, not only from minimizing risk, but also just minimizing business lost, because you don’t have to have those difficult conversations with customers.
Just in closing here, I’ll go around the horn and see if there’s anything, Mike or David, you want to add.
I think Kamran hit it right on the head. Data volumes are too robust and there’s too much at stake not to use technology to enhance any type of breach response. I do think also taking a measured approach, trusting [the use of] AI, [understanding] it’s really always being measured on a certain amount of statistical significance, and accuracy and margin of error.
And taking those same concepts and applying them to validation so that you can actually put eyes on a subset of documents via a statistical sample of a larger population, actually look at those and you can say, “Hey, there’s nothing here, or hey, there is something here”, is a great approach to really using the technology as a force multiplier for humans. And that’s what AI is.
We’re not at a point where we can just ask ChatGPT to go out and find all the PII. It’s a force multiplier. And humans are so essential to the process.
That’s the advantage of AI. Thank you.
And we are at the top of the hour here. And we really appreciate Mike and Kamran, and David for sharing their experience and insight with us for the last hour or so. And we thank all of you who have logged in now or who are listening to this recording later, who took time out of your schedule to listen and to participate in today’s webcast. We know your time is valuable, and we appreciate you sharing it with us.
You can learn more about this webcast and you can register for any of HaystackID’s upcoming webcasts and review our extensive library of on-demand webcasts at haystackid.com.
HaystackID is a specialized eDiscovery services firm that supports law firms and corporate legal departments through its HaystackID Discovery Intelligence, HaystackID Core, and HaystackID Global Advisory offerings. In addition to increased offerings, HaystackID has expanded with five investments since 2018. Repeatedly recognized as a trusted service provider by prestigious publishers such as Chambers, Gartner, IDC MarketScape, and The National Law Journal, HaystackID implements innovative cyber discovery services, enterprise solutions, and legal discovery offerings to leading companies across North America and Europe, all while providing best-in-class customer service and prioritizing security, privacy, and integrity. For more information about its suite of services, including programs and solutions for unique legal enterprise needs, please visit HaystackID.com.