[Webcast Transcript] Proper Hygiene Matters: Three Cybersecurity, Privacy, and Data Protection Challenges and Solutions
Editor’s Note: HaystackID hosted an educational webcast on March 15, 2023, to discuss the growing importance of cyber and data hygiene in the context of privacy, cybersecurity, and data protection. The webcast focused on how individuals, organizations, and even nations are recognizing and prioritizing fundamental rights and freedoms in the face of common cybersecurity concerns.
The expert panel, led by Michael Sarlo, a seasoned professional in cybersecurity, legal discovery, and computer forensics investigations, addressed specific challenges to achieving proper cyber and data hygiene and presented three potential tools that can be used to augment current approaches.
While the entire recorded presentation is available for on-demand viewing, a complete transcript of the presentation is provided for your convenience.
[Webcast Transcript] Proper Hygiene Matters: Three Cybersecurity, Privacy, and Data Protection Challenges and Solutions
+ Michael Sarlo, EnCE, CBE, RCA, CCLO, CCPA
HaystackID – Chief Innovation Officer and President of Global Investigation Services
+ Anya Korolyov
HaystackID – Vice President, Cyber Incident Response and Custom Solutions
+ John Brewer
HaystackID – Chief Data Scientist
+ Christopher Wall
HaystackID – Data Protection Officer and Special Counsel for Global Privacy and Forensics
Hello, and welcome to today’s webinar. We have a great presentation lined up for you today, but before we get started, there are just a few general admin points to cover. First and foremost, please use the online question tool to post any questions you have, and we will share them with our speakers. Second, if you experience any technical difficulty today, please let us know in that same question tool and we’ll do our best to resolve them. And finally, just to note, the recording of this session will be shared via email in the coming days. So, without further ado, I’d like to hand over to [Michael Sarlo].
Thanks so much, and hello and welcome from HaystackID. I hope you are having a great week. My name is Michael Sarlo, and on behalf of the entire team at HaystackID, I would to thank you for attending today’s presentation and discussion titled “Proper Hygiene Matters: Three Cybersecurity, Privacy, and Data Protection Challenges and Solutions.” Today’s webcast is part of HaystackID’s regular series of educational presentations developed to ensure listeners are proactively prepared to achieve their cybersecurity, information governance, and eDiscovery objectives. Our expert presenters for today’s webcast include individuals and experts with a deep understanding of, and practical experience with, cybersecurity, information governance, and data protection issues, considerations, and solutions. I’ll be somewhat your moderator today, although this is going to be a dynamic discussion and presentation from our attendees. My name is Mike Sarlo. I’m the Chief Innovation Officer and President of Global investigations and Cyber Incident Response Services here at HaystackID. I work closely with clients, dealing with high-risk engagements, both on the investigations and general eDiscovery side of our business, and oversee the day-to-day from thought leadership and management of our Cyber Incident Response Division.
I’ll turn it over to Anya Korolyov to introduce herself.
Thank you, Mike. Hi, everybody. I’m Anya Korolyov. I’m the Vice President of Cyber Incident Response and Custom Solutions at HaystackID. I’m a former practicing attorney. I have about 17 years in eDiscovery, working on investigative matters, including Cyber Incident Response, for the last few years. I’m a Relativity Master, and I have a thorough understanding of document review and the data side of these matters.
Thanks, Anya. I’m John Brewer. I’m the Chief Data Scientist at HaystackID. I’ve been in the data space since the late ’90s, so I’ve had an opportunity to watch the field grow into what it’s become today. During the 2000s, I was an engineer working in data migration for SAP and for the ERP world, and then I ended up with HaystackID about seven or eight years ago now. We’ve been on a bunch of different adventures together to work in the new and evolving ways that it is impacting the eDiscovery space, the privacy space, and now the cyberspace, and I look very much forward to talking about it today.
Chris, do you want to introduce yourself?
Yes, hey, I’m Chris wall. I’m DPO, in-house counsel, and chair of the privacy advisory practice at HaystackID, and my job at HaystackID is to guide both HaystackID and our clients through the privacy and data protection thicket, as they engage in cyber investigations, information governance, and traditional discovery, and as our clients find themselves going down paths involving eDiscovery, data protection, and privacy issues in jurisdictions maybe perhaps less trod.
Back to you, Mike.
Thanks, Chris, and I love that saying, “the privacy and security thicket”. That’s really great, and that’s what we’re going to be venturing through today. This is really designed to give viewers a functional overview of three problems and three solutions that are interrelated that we deal with quite often. We’re going to start, however, with a high-level understanding of trade secrets, and misappropriation, as we talk about our departed employee and insider threat detection and remediation investigative capabilities and how we frame positive outcomes using forensic evidence in these types of matters. We’ll close that out with the types of protective measures (policy, processes, controls, and technology) that can be utilized to prevent and/or flag any type of misappropriation of trade secrets. And I would be thinking, as you’re looking at this, that this was the way that we talked about policy as well, how do we secure customer data; not just trade secrets, but also data that any organization might be holding about individuals that are very costly if there’s a data breach, and that’s where we’ll be taking a deep dive into our AI-enabled approach to Cyber Incident Response, post-breach, with a particular focus on regulatory response and notification to impacted individuals during a breach, and then we’re going to get back and round things off with a discussion about our Global Advisory practice, and with a special focus on privacy, which we believe certainly is a fundamental concept in everything we do, and therefore, it’s incredibly important for all eDiscovery practitioners to constantly be aware of their privacy obligations and implications for the work they’re doing.
So, this is an older stat, but really has grown, and what we just want to point out is that since the 2000s, and the ’90s, the majority of assets that are really valuable to large corporations are intangible assets. These are things source code, proprietary scientific formulas, engineering drawings, things that might not be an engine component sitting in a warehouse. So, as a result, organizations need to be very vigilant when it comes to protecting their trade secrets, and really important because I find that oftentimes we deal with customers who may be in this $40 million range, even 10-40 million, where they have revenues where they’ve grown very rapidly, and they don’t necessarily have the controls in place that very large organizations have, as it relates to securing trade secrets, and they sometimes get themselves in trouble, where we see that there’s a departed employee, they’ve stolen quite a bit of data, but lo and behold, they really didn’t take any steps to safeguard those trade secrets. This is an important test, so a six-factor test.
Yes, Mike, I might want to weigh in here, and it’s really constructive, I think, to start a discussion like this with the stats that you’ve shared at the outset here. But I think, really, you’ve got laid out here a six-factor test to determine whether something is really a trade secret, and I think that’s important, but I think it’s also risky because we have to remember that trade secrets are really broadly defined, and they are information. It could be a formula, a program. It could be a device. It could be a method or a process that on its own simply has some economic value. You don’t even have to show that it has economic value, just that potentially it might, and obviously, it can’t be generally known to the public as you’ve got here listed in your factors. Again, it’s really important for us to remember that it’s really broadly defined and these trade secrets, they’re everywhere. Because it’s defined so broadly, every organization, every company, is going to have these trade secrets, and they are everywhere.
So, important, but also very important to the defendants in a case of any type of misappropriation will tug on the heartstrings around an organization not actually safeguarding a trade secret, so to the extent that that’s important. It’s important for organizations to think about how they secure their intellectual property, and certainly, trade secrets is a very broad definition in today’s economy for businesses.
So, what we do find, though, is that there are really two verticals where trade secrets are typically misappropriated. The first is departed and terminated employees, and the second is data breaches, and a data breach can also be an accident. You have an employee who accidentally posts something on the internet, or sends out a file that contains some type of trade secret to a party, that would be somewhat lumped in there as well, but also, as everybody’s seen in the news, with the rise of ransomware, and exfiltration of data, we’ve seen some very bad leaks for large corporations as it involves their trade secrets.
So, going back to a survey from the Ponemon Organization, about 50% of departed employees – and this is 2012, I’m sure it’s gone up in today’s remote world – actually have admitted to keeping confidential company data. 40% of them plan to use that data in their new job, and in particular, with software developers, they typically feel they have some type of ownership of source code, and this is something near and dear to my heart, because I’ve had source code stolen from me before, and this is something that we find we need to keep tight control on with controls.
Go ahead, Chris.
Yes, I was just going to say, Mike, we call out here the two most reliable exfiltration vectors, via breach and via departing or separating employees, but interestingly, I read, I think during the pandemic, a report from the US DoJ that actually talked about the most common industries where trade secret theft occurs, and they were technology. Duh. I mean, that’s where you would expect it. But also in healthcare, in defense, and in manufacturing, and of course, according to that report, most of the trade secrets were lost from companies themselves, and I think it’s really important to note that that loss is either via an employee departure or through malfeasance, like a breach, or it could be lost through a third-party vendor or through a contractor, but I think it’s important to note that well over 50% of that loss is from insiders.
Sure, and I think another thing to tug on what we’re really seeing, especially as a big focus for global regulatory agencies as well, is this concept of pixel trackers also sucking in customer data that shouldn’t be going somewhere, and also, in some ways, you’re stealing from organizations that may not know they have these in their websites or in their code, confidential bits of data that might be flowing out of the company. So, another really in-focus pathway of exfiltration that’s a little bit different, but something that we’ve been working with healthcare companies, in particular, on remediating. So, everybody should be aware of those pixel trackers.
So, we’re going to jump in a little bit into some of the practicalities of a forensic exam for departing employee investigation, and HaystackID offers this type of exam. You’re going to see several slides, we’re going to breeze through it a little bit, because we could spend several hours on forensic investigation, from departing employee events, but we just want to give a practical roadmap on the types of things we do, the types of evidence we look at, and how that’s leveraged in a forensic investigation. We typically offer this to organizations for a packaged fixed fee, which can be really great when you’re concerned about smoke, you don’t see smoke, and you maybe want to just do a quick investigation, or if there is smoke, we usually find there’s fire, and you want us to come in and validate that. We work with corporations to enhance their internal capabilities to perform these investigations and also offer it as an entirely outsourced service.
So, the first thing that we always ask is (a) when did the employee depart from the company? What was their role? What did they have access to? What are the lists of assets? Did they have a laptop? A cell phone? What endpoints did they have access to, and really important for growing companies where you have folks who have been there for 15, 20 years, just going back to what you had mentioned, Chris. It’s certainly very easy for an organization to lose track of the cell phones, the computers, the laptops; folks who, especially on the executive team, may somewhat exist out of the purview of evolving security landscapes. What’s the current status? Were the PIN codes and credentials collected from mobile devices? Companies that don’t have MDM mobile device access via policy, oftentimes can find themselves in a very sticky situation when a departed employee turns in a mobile device and, lo and behold, they don’t have access to everything [inaudible]. What specific business systems did a constant employee have access to? It’s really more of an email these days, and we oftentimes find that there’s a broad swath of tooling that needs to be investigated.
Access logs are so critical, and we always ask as well, have you extended any type of logging that may be going on in any cloud-based system, or even on physical servers, to make sure that forensic evidence isn’t necessarily dying on the vine? And we have some customers who generate literally terabytes of telemetry data a day, and that data is only held for a few weeks, and it’s not uncommon for these incidents to pop up maybe two, three, four, five, six months after a departure.
And then we also find that folks might actually still have access to systems. We’ve had several cases where some of the primary [inaudible] that a departed employer was now at a competitor, and they were logging in to the prior employer’s systems, and actively exfiltrating data and keeping tabs on them.
And then certainly we want to know who are the actors? Are there specific competitors? Are there documents and interests that are important, so we can take a look at that stuff?
Hey, I just want to mention one more thing there, Mike, and Anya can probably weigh in on this too, but keep in mind, the biggest struggle when we’re scoping, when we’re trying to figure out the answer to the scoping questions, is, who are the people and who are the stakeholders, frankly, who have those answers? And it’s far better to find out who those people are or where you can go for those answers before you have to go find those answers in an emergency. For instance, in a breach situation. So, it’s really good to know who has those answers now, when you’re not in crisis mode.
Correct. And of course, where we’re going to usually find the richest repository, from a forensics evidence standpoint, to begin with is their laptop or their workstation. I don’t see too many workstations these days. It’s mostly laptops. And we’re going to ask all the basic questions we would ask that are important to understand whether or not that user was an admin on their workstation. We usually find when we’re dealing with Mac users, as well, controls are more difficult to put in place, from an administrative standpoint, and there’s more access to the inner workings of the system, so investigations can be lengthier, and require deeper analysis. We want to know if USB ports were enabled for you to plug-in hard drives. Did they have Bluetooth access? We’ve seen scenarios where data was actually exfiltrated over Bluetooth to a cell phone, or actually they just went and printed it, and so we want to look at the principal, did they have access to other things, and certainly getting a hold of encryption is very important, and making sure you collect those encryption keys early on and upfront.
And with the mobile devices, again, just to touch on it, we’ve got to know if there’s anything out of place. You want to see the logging around app usage. We want to be able to image that phone and to understand certainly what they were doing, who they were texting, what sites they were accessing, were there any secure messaging apps, things like that, and all of that is very important in a departing employee investigation. Don’t ever overlook the mobile devices. They’re so important.
We oftentimes also are lucky enough to find backups of mobile phones on company computers, and those can be a very rich repository. We’re going to want to make sure that in addition to email, we’re covering key messaging applications, we’re looking at cloud data, things like Dropbox, Box.com, Google Drive. Were they installed on a computer at some point? These are very common exfiltration methods for a bad actor. They just log into their Google Drive and start dragging and dropping things, or sometimes foolishly they use their company email to email themselves, or they might log into a personal email account and email things. That’s what we see oftentimes.
I’ve got to weigh in here as the privacy guy. From a privacy standpoint, aside from those technical scoping questions about where the data is, how it’s stored or transmitted, you really have to know the nature of the data as well. What kind of data is transferred? Classified data? And I’m not talking about that in terms of national security, but classified data in the privacy context. It means the types of personal data that might be in that collection. What’s going back and forth? Does it transmit personal, sensitive information, account information, personal identifiers, address, phone numbers? All of that information. You kind of need to know whether that’s in the collection, because that raises the risk to another level entirely.
Certainly, and we’ve seen cases, especially in a healthcare setting where patient data, as an example, was taken by a doctor who is going to go start their own new practice. That raises some significant privacy questions beyond just the exfiltration.
And so, there’s a lot of recoverable information. We sometimes see that bad actors try to cover their tracks. Over the past few years with the news and whatnot of the capabilities to look at deleted emails, deleted fails and whatnot, there’s been a focus in the news, so the average layperson is more aware. And oftentimes, will try some basic forensics, they will at least go as far as or try to empty their recycle bin.
It’s not uncommon that you get that computer that might be wiped, and we have to do some data recovery there. But it’s going to be important to look at the recycle bin. It’s going to be important to look at all the file and registry information as far as linked files to see what folders they might have had open when they did something.
USB activity is usually the kiss of death, and that’s where we really, at the end of this, want to get a TRO in place to stop further harm. And we’re looking for the most impactful evidence. You’re not really looking at every bit drop to begin with. You want to get the stuff that’s going to be easy and result in a low-cost win.
But the USB device history is important. We always get the question, can you tell what exactly was copying? You usually can’t. There’s not a log that says, “Here are the files that were copied to a USB drive”. It’s usually, we see a hard drive was plugged in. There is then a lot of activity of files on the computer around the time of that hard drive being plugged in they might be modified, they might be accessed, and that could indicate they copied, so to speak. And then we see ejection of a hard drive.
It’s important to use all of this different evidence to really paint a picture about what happened and that’s [inaudible] positive outcomes and where folks have different experiences with different forensics providers. We really always are looking at the evidence and how is this going to be used in a matter, not just from the technical purposes, and try to advise in that direction.
And of course, there’s a lot of data that can be recovered from mobile devices. Location data can be incredibly important. Active Wi-Fi data that’s actually using a platform called WiGLE. We’ve had cases where we’ve been actually able to track a person’s movement based on the Wi-Fi routers that their phone indexed as they were walking through a facility. And so, certainly, there’s a lot going on with your mobile phone and there’s a lot of data available to us.
And again, on the cloud storage side, we’ve got to make sure we get those audit logs very quickly. We want to make sure we’re getting a really holistic approach to the logs going into a SIEM, but also going to the source. A lot of [op] aggregation that corporations use these days, but it may only be [inaudible] certain identifiers. And so, it’s really important during our forensic exam to make sure you’re grabbing and logging from the source and also from any type of centralized repository. And we want to see if there’s any type of broad file access, bulk downloads, bulk deletions.
We’ve had situations where a part-employee locked out the company from an administrator standpoint, they locked out all the other admins and held the company hostage, until eventually, we got them to give everything back. But you can certainly run into issues. Keep an eye on the folks who have the keys to the kingdom. That’s where we see the most harm.
And we also do offer – we’re one of the only companies in the private sector who is able to offer mobile phone unlocks for, really, a broad array of devices and operating systems, both for Apple and Android. And on top of that, we’re also able to, oftentimes, grab both physical images where – this is really more like Government-grade technology – that’s able to go much deeper in an investigation.
So, if you ever encounter a locked phone, you’re in that kind of situation, you have something that was given to you that was locked, we do have capabilities here to enhance your investigation response.
And so, a lot of this can be avoided through protective measures. And certainly, policies are the foundation. Acceptable use policies, data retention, and data use policies.
I know Chris just lives and breathes DPAs and policy documents. Maybe you want to give some of the—
But to back up a little bit, all of these other policies are important from a privacy standpoint. Just keep in mind that data retention, particularly in the form of data minimization is a key tenet of every data privacy or data protection regulation out there, from Brazil’s LGPD to China’s PIPL, to the E.U.’s GDPR or even to all five of the U.S. state privacy laws that will take effect in 2023. Every one of those data protection regimes has, as a core principle, the data minimization principle, not keeping around data that you don’t absolutely need. It’s absolutely fundamental to sound privacy practices. And you can be fined if you do keep personal data around longer than your stated retention period.
And one of the most painful [inaudible] ones is the combination of BYOD and not having an acceptable use policy that establishes domain over company data on a personal device. It’s been incredibly costly for organizations and it’s really something that we still see out there, especially in growing companies that experience this type of event and, oftentimes, they’re most at risk.
Exit interview and offboarding plans, so important especially in these days of a remote world. We see too often that there’s a lack of protocols and SOPs in place once an employee departs an organization. Companies are doing a very poor job of keeping track of assets that an employee just leaves, and they don’t check it back. So, there’s a lot of need here for organizations, we have gone in and helped developed better processes here.
Not turning down physical access keys or revoking keycard access. In the land of remote, some companies have just forgot about their empty spaces. And of course, having an in-sync shutdown of company accounts, not necessarily deletion. We sometimes think that it’s really a good idea to really code their employees, so they understand who is at risk during a departure, so that their data isn’t automatically just purged when they—
Mike, keep in mind that under some data protection laws like the GDPR, employees, they have the right to request deletion of their personal data once their employment has ended. And in those cases, the employer has to comply obviously, as long as there’s no legal obligation or some legitimate reason for keeping that data around. But regardless, if some employer is thinking about keeping a former employee’s data post-termination, that employer really needs to make sure that they’ve got a clear and a lawful basis for keeping that employee’s data around post-termination.
Great point. Yes, very great point. Training too is critical. It’s critical for your team members, internal teams. We constantly are hammering on training as it relates to data security and data privacy. It will require employees to really be doing this on a weekly basis.
And again, just having really good in-sync mechanisms between legal, HR, IT to think about enhancing your ability to investigate. Things like if we have an issue and we didn’t have logging, but actually go and look and make sure that we get to a comfortable place, at least 90 days in some cases where we can do a lookback.
And then finally, just to conclude here, before an event occurs, implement policies, and put systems in place to ensure that processes are enforced, technology is really great here.
When you have a key employee who- is a departure, make sure that the company policies are followed and that proper offboarding steps are followed, and make sure you have an offboarding workflow that’s designed to catch and allow you access to data in the event you do need to investigate. When an event occurs, really, policies, again, are important. The way that you begin investigation. The way you engage outside counsel. Sometimes – too many times we find IT has tried to perform their own investigation and they spoil critical metadata and evidence.
And then if there is – and oftentimes, we do this, you’re working between two parties, and there’s always a route through the legal system to, oftentimes, get a remediation protocol and have that data removed. Just because it’s been exfiltrated, it doesn’t mean that all is lost. Good employment counsel will be familiar with running through with what we would call a “remediation protocol” with an opposing party, oftentimes, there’s a neutral vendor or having a neutral come in or they just want to use the same vendor anyways because they’re kind of caught – dead to rights. And we’ll go in and we’ll remediate any sensitive data that may have come out of the company from a competitor system or from personal devices and accounts.
And sometimes, this type of exfiltration could be considered a data breach. And so, I’m going to kick it off to the amazing Anya Korolyov to bring us through some of the tricks that we deal with, and we see with data breach and what we do with data mining.
And John Brewer will also talk about our AI-enabled approach here, which is so important for all the topics we’re covering.
Thank you, Mike. And I do want to jump back just a tiny little bit, I wasn’t quick enough when you were on that slide, to go back to the data retention policies.
Just from the Cyber Incident Response perspective, the amount of cases where we have clients that have had their taken and come to us and “We expect to find maybe a thousand data subjects, maybe 10,000”. And when we get through the data and we tell them how many data subjects we have found, many companies are shocked at the numbers we provide to them.
And then we always find some documents that have been sitting on their networks or in somebody’s email box that go back years and years, and really shouldn’t be sitting there. And we experienced this with quite a few cases, so I do want to highlight that it’s a very real thing, and it will definitely help if your company is unfortunate enough to experience one of these events.
Not to put too fine a point on that, though, Anya. That’s really important, figuring out just how many individuals are affected, how many data subjects are affected. Because that affects just where and the timing of breach notification, depending on the various laws you have to comply with.
Exactly, and how far the regulators will take it, and what fines are potentially in store. There’s a lot of implications that are coming out of this, especially with people that are in HR, or even financial departments that data retention is really, really critical.
The fines are not getting easier, nor are the regulatory agencies, especially in the U.S. where we haven’t had great privacy enforcement. But when it comes to data breach, they’re taking things very seriously from a notification standpoint.
Exactly. So, in the unfortunate event that the data got out there, for whatever reason whether via a breach or a departed employee or just carelessness, there are many instances in which a response is actually needed. So, an assessment of the data is always needed to figure out what is in the data.
Was their exfiltration and what the harm of it will be? There are several factors to consider here. One of the most important ones, of course, is customers and contractual obligations. Jurisdiction comes into play here. What are the company’s obligations there. Is there any third-party data, and are we under a risk of notifying potentially some other company and their customers that the data was breached? As far as what is in the data, there are several possibilities here, whether it’s PII, PHI, or maybe those elements are not present, but we have business sensitive information that got out there, in which case it’s a very different review but still, nevertheless, we need to look at the data and assess what the damage is.
And of course, the state, federal, and GDPR, all the regulations that are out there that are constantly changing, and they’re improving as far as protecting the individuals, but the companies do really need to stay on top of these regulations and figure out what their obligations are and who – as Chris just mentioned – who they will need to notify based on the information found, and based on the number of data subjects that were potentially exfiltrated within this data.
So, from this point, an event happened, to let’s figure out what’s in the data and how we’re going to handle it, there is usually a very short time period. There is not enough time to really lay your eyes on every document and figure this out, and that’s when the custodian interviews start to happen, that’s when we start working with the company to figure out what potentially could be in this data that they’re aware of.
Many times, companies do have a very good knowledge of what’s potentially in this data, what kind of folder this is, what information it should contain, how far back it goes. That is one of the things that we usually help them figure out. But at least they can point us to some of the folders and explain if it’s email, or what that custodian usually – what kind of data they usually deal with and what we can find there.
But many times, we also deal with companies where the threat actor doesn’t think what they took, they just took something, and we’re looking at share files and we’re looking at potentially terabytes and terabytes of data on a very short timeline. And it’s our job to figure out what is actually in that data and how do we go from there to getting something defensible in front of the breach coaches and in front of the client to figure out what we’re going to do and how we’re going to handle it.
And I already covered the previous slide that Mike just really – the location of the data, of course, matters, whether the data is in the U.S. or in the E.U., that plays a major role in our investigation. But there is a very good chance – and nine out of 10… even with the client that says there is no chance that we’re going to have any EU data subjects. For example, we’re a U.S.-based hospital, we don’t expect anything. We always find there’s patients that traveled to the United States and, unfortunately, ended up in a hospital, or employees that are actually citizens of the E.U. So, there’s always somebody that is [in the wires] that might have some GDPR compliance complications.
So, it’s very important to know your data, know where it originated, and know what potentially could be in there.
And one of the key features of our proprietary Protect Analytics is the ability to offer an early rapid geographic assessment, where we start, basically, pulling out addresses we see and entities and placing them on a map. It may not be a data subject at that point, but it starts to give you an early broad spectrum look at the potentiality around risk in this domain, and that’s something that’s—
That is the real value. Well, one of the great values of Protect Analytics is the fact that we know what jurisdictions we’ll need to comply with, or our clients will need to comply with, as they either deal with a breach – or as we’ll talk about here in a bit – with a DSAR.
So, it’s interesting because when we are dealing with Protect Analytics, there’s kind of a two-sided coin here in that Protect Analytics is the suite of tools that we use to analyze the data that we find during these breaches. But it’s also useful to point out that a lot of the tool that we’re using, facial recognition analytics and domain tracking in particular are things that are being used throughout other organizations.
And one thing that I don’t think a lot of organizations realize is that if you have these technologies deployed in your organization for your any reasons, like facial recognition for your security or for your login system or time tracking or anything else is becoming more and more popular, but that is, in many cases, a form of PII. And in some cases, even PHI. I think that a lot of – especially IT and security organizations aren’t used to dealing with these sorts of regulations. They don’t really think of themselves as having particular privacy exposure in that realm end up getting bitten by these.
But to talk a little bit about Protect Analytics and how that is working. Basically, we take the full weight of the latest advancements in machine learning, AI, and whatnot at Haystack, we keep using our older technologies, our search terms, our regular expressions and synthesize all of those together into an integrated way of looking at these tremendous quantities of data. And it’s a big challenge.
It’s actually quite common for cybersecurity incidents to have data volumes in the terabytes, and we’re starting to see them reaching up even into the petabyte range. So, having all of your ducks in a row and knowing how you’re going to be attacking the data is of the utmost importance right out of the gate. Because the timelines don’t care about how long – or how much data you’re working with.
Anya, about how long do we normally have from the time of detection to the time where we start having real obligations on reporting and notification?
The general timeline is about six weeks.
And so, when we’re dealing with tens of terabytes of data, just moving that around can become significant.
Fortunately, we live in an age where we have huge amounts of technology available to us. Especially during my time at HaystackID, we’ve really done a great job of putting these together into the multi-tiered approach where we are using keyword search techniques where we’re using regular expressions both on our – both in our environment and even in the client environment, whether that’s on-prem or in the cloud. And we’re getting more of those capabilities all the time.
Actually, that reminds me, one thing that I didn’t break in when Mike was talking about intangible assets and source code review in particular. One of the new intangible assets that we’re starting to see circulating a great deal are AI models, not unlike the ones that we use at HaystackID where they are trained up to do particular things, read documents, recognize individuals, or solve relatively complicated problems. And those models, like the actual data of those models is usually only a few gigabytes of data. It’s relatively easy for somebody to exfiltrate if they are so inclined.
And I think it’s going to become of the big risk factors in terms of trade secrets as we see AI and custom-trained models propagating through the industry, because even something like ChatGPT which costs an enormous amount to develop – I understand that it took $1.2 million just in the electricity to train the model – actually fits in about 800 gigabytes when you come right down to it. So, that’s something that fits on a regular USB drive as Mike was talking about earlier, kind of this whole new realm of intangible, digital assets that we’re going to be having to manage and are probably going to become major assets in these sorts of incidents going forward as those become the new high-value footballs in the business world.
But I think that pretty much covers that. Anya, do you want to talk a little bit about how we visualize this data once we have it?
Definitely, thank you, John. So, once we get the data and we run it through our Protect Analytics analysis, we create an impact assessment report that basically lays out everything that we have found in the data. And it will be over-inclusive for a reason, depending on the industry, depending on the client type. We will give them the full picture. And that picture will include the domain analysis that we mentioned, and it will also include all the PII and PHI elements we have identified within the data. And that will be further broken down by not only data file types as you see here, but also folder level, and potentially where appropriate document level as well for some of the most dense documents because that is another thing we’re able to calculate is the density of the PII within each document. So, we will provide a breakdown of all the elements.
What we have found is extremely helpful is not just provide the total numbers, but also assign risk tiers to these elements that we find. So, basically, we say anything that has a name, address, or phone number is an identity risk tier. And then anything that is medical related is PHI. And then anything that is a high-risk tier, meaning if you have found somebody’s name and a social security number, you are mostly obligated to notify that person, so that makes it a high-risk tier. Something like a generic ID or an IP address, while it might matter for business-sensitive information, it most likely will not rise to the level of notification for the purposes of PII.
And then we create these reports, and we say, okay, we found this many documents that contain some sort of element, but we also found a much smaller population of documents that contains an identity risk tier and a high-risk tier element. And that is usually the population of documents that we suggest we lay our eyes on. Again, depending on further information, file type, location of the document, potentially file name, there are many instances where you could legitimately have a document where our analysis says there is PII, but the client might come back and say, “No, I know what these documents. There are many doctors that go to conferences and get pamphlets that contain a lot of names of the people that attended the conferences and a lot of medical terms”. That’s not to say that we’re going to find PII there. And given the amount of time that we have to work with before we have to notify everybody, the last thing we want to do is look at documents like that.
Our job here is to identify the smallest and the most precise population of documents that we’re going to actually review to confirm what the PII in there is and who are the people that we need to notify.
I just want to point out this tooling can be deployed pre-breach behind a client’s firewall. This is what we, oftentimes, use for clients who are looking to conduct privacy-based information governance. They want to clean up their data, they want to know where their data is at, they want to know where that high-risk data is. And this does a very great job. Especially from a prioritization standpoint, we start to look at density and that’s where we start to draw heatmaps.
And Anya, maybe you want to talk a little bit about that.
Yes, I was going to talk about that next. That goes back to the information governance that we all keep talking about, and that is one of the greatest tools here is that once we have this information, once we know which documents – I keep saying we’re going to tell you, the location of these documents, we can also tell you the date of these documents based on the metadata of the documents.
So, if I’m looking at the report and I’m finding documents that go back to 1992 and are located within an HR folder, that is a huge red flag for the information governance.
So, that is extremely helpful. We create these heatmaps, again, based on all this information, the documents’ locations. We analyze the file names to see if we can pick out some sort of theme, for example, weekly, monthly reports that a company is running that contains a lot of PII, how far back do these reports go, where are they sitting on the network, things like that.
So, we provide the information for the company before the breach happens even to say this is where I need to concentrate, and this is where I need to figure out what to do with this data, so it doesn’t continue to sit there for a potential breach.
Post-review and extraction of the actual data subjects. Once we confirm that there is PII here and John Smith, unfortunately, does have a social security somewhere in there, we create jurisdictional and element reporting, meaning we provide a report that lists out all the states of the people that we have found as affected, and what elements are affected in those states. Not every state has the same regulations, or notifying requirements. The GDPR, of course, has its own requirements. So, in Nebraska, you do not need to notify somebody whose only element is date of birth. We can potentially set that aside. But we do provide that full reporting, and we provide that reporting if we need to, on-the-fly, during the extraction process. It’s not quite as helpful as policy duplication, which we will touch base on in a second, but we can provide that.
Post our review and post us providing a notification list, we can also provide a defensibility report that will outline all the steps that we took from the very beginning, how we analyze the data, how we sample the data, and how we arrived at the population of documents that we have decided to review and extract from. How we extracted the information, and how we went from extracting potentially a million data subjects to having a notification list of 80,000. And that’s one of the important things.
Companies keep records and my name probably shows up in tons and tons of documents for the company I work in. And the last thing we want to do is over-notify and send many letters to the same person. So, one of the best things that we do with Protect Analytics is our deduplication process. And we go through the list of people that we have extracted, and we make sure that we create the most concise record of the notification list.
So, we say if anybody has the same last name, first name, social security, for example, and they show up 50 times, we roll that up into one line. But we also take it further and we say, “Well, let’s look at people who have the same last name, first name, date of birth, and address,” depending on the data we extracted. But we do get it to be as concise as possible.
And then once the notification list goes out and people have a window where they can call back, and they can ask for the information that we found for them, we do have a process for the DSAR.
So, at that point, we go back to the actual source documents where we found this information. We look at the documents. We confirm if the people would like to see the documents. In many instances, the breach happened to a third-party vendor, so sometimes the document might contain information that doesn’t belong to even the same company, not only the same individual, but the same company. So, we redact everything.
And Mike did just have an example of a heatmap up on the screen. That is one of the heat maps that we have created, and these dates are, unfortunately, real. I have seen documents go farther even back than that. I definitely wish that people would have taken a look at it and said, “Do we need to keep something that’s this old? We probably have this data somewhere else already many times over”. So, it is very helpful.
So, I think at this stage, we want to kick it back to Chris to wrap us up here with our practices.
I can wrap up here. So, I’ll start off by saying I’m a big fan of the five-P rule or as some of my veteran friends might say, the six-P rule, which is Proper Preparation Prevents Poor Performance. I’m also an Eagle Scout and I live by that scout motto. Preparation is the key.
So, when we talk about breaches and we talk about DSAR responses, the best course of action is to be prepared. And it used to be that cybersecurity and information governance were those things that we knew that we needed to do. They were like those good eating habits, the fruits and vegetables we always knew we were supposed to be eating. And we knew 20 years ago that we needed to be mindful of where all of our data is, what it is, what we have, what controls we have in place to protect it.
But along comes our data doctor checkup, known as the CCPA or the GDPR. And during that checkup, the doctor tells us, your privacy is out of whack and it makes it abundantly clear just how important it was for us to be doing those things for the last 20 years. So, privacy is that catalyst that makes us realize that data hygiene is incredibly important.
Data privacy and data compliance isn’t so much a separate set of compliance steps all together, but rather it’s doing all of these other compliance steps that we should have already been doing just through a privacy lens.
So, the idea is that the most efficient way to approach privacy, information governance, and information security or cybersecurity is to do it together.
Certainly, if you have to, you can tackle them separately, but because they all involve similar activities, it’s most efficient and most cost-effective if we handle them together. Let’s go to the next one, Mike.
The underlying theme here, I think, that we have to manage is risk. And addressing privacy concern is a great illustration of how all of these advisory services, like those that HaystackID offers, how they work together in an interlocking fashion.
And you’re really not being efficient or cost-effective about the way that you’re doing it if you don’t consider all three, information governance, cybersecurity, and privacy all at the same time.
So, it’s a great opportunity for companies to review, to revise where necessary, or to update their internal policies that we talked about earlier, and to bring them into compliance with these state laws that I’ve mentioned a couple of times here. There are five in the U.S. in 2023. And the GDPR. And the other – there are some 33-odd privacy laws coming online or expected to come online in 2023 around the world.
So, vendor privacy management is really important in this brave, new privacy world where companies are often unaware of the requirement for data processing agreements. Or to fully understand the role of their contractors or their vendors in the process. And that includes us in the legal industry.
And then, finally, these services all interlock. From a privacy standpoint or through the privacy lens as we’re creating a data map for InfoGov purposes, we can also map data flow.
And that’s something that our advisory services can do at HaystackID, of course, but we also have our protect services that allow you to apply AI and technology to help you do it more cost-effectively, more efficiently, and all together.
And the last one there. The most important thing to do, I think – last slide there, Mike – is to make sure that we have the right team in place. And we talked about – well, I’ve talked about several times here at the end – bringing together InfoGov, InfoSec, and privacy. It’s a lot of the same work, on the same data, and the same infrastructure, but it’s viewed through three different lenses.
So, as you and your organization are doing this kind of work, don’t look at it through a telescope, but look through it with binoculars, three-eyed trinoculars, if you will. We need to make sure that we’re looking at it through all three lenses so that we’re covering our bases, and being efficient about it.
And with that, back to you, Mike.
We do have a question, and it’s a great one. “How unique are BYOD policies and how much time and money should be budgeted to build one? Does HaystackID consult on building those policies for companies with international offices? And how do international privacy policies affect things like a BYOD policy?
Chris, that’s quite one to unpack, but I think it’s right up your alley. Maybe you can give us the 35-second elevator response.
Yes, so multiple questions there. How is it different? Well, in terms of the data that’s on those devices, obviously, you want to create a policy to begin with about governing the use of those devices in the workplace and what data goes onto that device.
Secondarily, there are technical controls you can put on those devices, in addition to the policies for using BYOD devices that allow you to technically manage both trade secrets and personal information that you might come into contact with within the organization.
I think the next question you had there was how much money should you spend in developing these kinds of policies?
Well, I’d love to tell you, you should spend millions and millions, and you should hire HaystackID to do it for you. But I don’t think it should cost you millions and millions. To give you a blanket number, it really will depend on your organization and the sophistication of the organization. It does not have to be really expensive to develop that kind of policy.
BYOD probably occupies at least half of the marketplace today. And the other half, obviously, being company-issued devices. And so, there are some pretty solid company-leading practices out there that you should be able to draw upon. And HaystackID can certainly assist in that respect. What part of that question did I miss there, Mike? I hope I covered them all.
I think it was just “Will we assist?” And we’re very happy to assist in this domain and we have in the past.
Yes, we certainly can.
I just want to let the audience know, as we close, that today’s webcast presentation was recorded for future on-demand viewing, and a copy of the presentation will be available for all attendees who want the on-demand version today as completed, we expect those items to be available on the HaystackID website soon after we complete today’s live presentation.
And I want to just thank everybody for the team here, for the information and insight. You guys are so great as always. We also want to thank everyone who took time out of their schedule today to attend today’s webcast. We know your time is valued and appreciate you sharing it with us today.
We also hope you will have an opportunity to attend our April monthly webcast, currently scheduled for April 19th. This upcoming webcast will feature expert considerations and best practices for selecting, staffing, and supporting complex and high-velocity legal document reviews.
You can learn more about this webcast and you can register for any of HaystackID’s upcoming webcasts and review our extensive library of on-demand webcasts at haystackid.com.
HaystackID is a specialized eDiscovery services firm that supports law firms and corporate legal departments and has increased its offerings and expanded with five acquisitions since 2018. Its core offerings now include Global Advisory, Discovery Intelligence, HaystackID Core™, and artificial intelligence-enhanced Global Managed Review services powered by ReviewRight®. The company has achieved ISO 27001 compliance and completed a SOC 2 Type 2 audit for all five trust principles for the second year in a row. Repeatedly recognized as a trusted service provider by prestigious publishers such as Chambers, Gartner, IDC, and The National Law Journal, HaystackID implements innovative cyber discovery services, enterprise solutions, and legal discovery offerings to leading companies across North America and Europe, all while providing best-in-class customer service and prioritizing security, privacy, and integrity. For more information about its suite of services, including programs and solutions for unique legal enterprise needs, please visit HaystackID.com.