Editor’s Note: On February 16, 2022, HaystackID shared an educational webcast designed to present and explore the foundations of analytics and highlight workflows, methods, and applications that can be leveraged in compliance initiatives, investigations, and discovery efforts. The presentation, led by digital forensics, investigations, and legal discovery experts, spotlighted methods, technologies, and processes to quickly analyze massive amounts of communications and information, and included a specific focus on concept searching.
While the entire recorded presentation is available for on-demand viewing, provided for your convenience is a transcript of the presentation.
[Webcast Transcript] Data Forensic Investigations Exploitation by Concept Searching
+ Warren G. Kruse II
Warren is Vice President of Data Forensics and Consulting at HaystackID. In this role, Mr. Kruse provides electronic discovery, data forensics, cybersecurity/incident response, and consulting services to assist with litigation, compliance, and investigations. He is an author of the book “Computer Forensics: Incident Response Essentials,” published by Addison-Wesley, and served as technical editor of the book “Computer Forensic Jumpstart.”
+ Bobby Malhotra
Bobby Malhotra is eDiscovery counsel with Munger, Tolles & Olson. His practice focuses on navigating complex eDiscovery litigation issues using his unique combination of technical knowledge and legal experience.
+ Robert “Bobby” Kruse
Bobby Kruse is an eDiscovery and digital forensic practitioner who has worked both in-house and as a consultant. He is currently part of the Global forensics, investigation, and eDiscovery group at Intel Inc., where he acts as Intel’s subject matter expert on all things eDiscovery.
Hello, everyone, and welcome to today’s webinar. We’ve got a great presentation lined up for you today, but before we get started, there are just a few general admin points to cover. First and foremost, please use the online question tool to post any questions that you have, and we will share them with our speakers. Second, if you experience any technical difficulties today, please let us know using that same questions tool and a member of our admin team will be on hand to support you. And finally, just to note, this session is being recorded and we’ll be sharing a copy of the recording with you in the coming days.
So, without further ado, I’d like to hand it over to our speakers to get us started.
Hello, good morning, good afternoon, good evening to our worldwide audience today. I hope you’re having a great week. My name is Warren Kruse and on behalf of the entire team at HaystackID, I would like to thank you for attending today’s presentation and discussion titled “Data Forensic Investigations Exploitation by Concept Searching”.
Today’s webcast is part of HaystackID’s regular series of educational presentations to ensure listeners are proactively prepared to achieve their cybersecurity, information governance, and eDiscovery objectives. Our expert presenters for today’s webcast include individuals deeply involved in the discipline of data forensics, in areas ranging from corporate investigations to cyber and legal discovery. As subject matter experts, they all have extensive practical and current experience in applying cutting-edge analytic approaches to improve the efficiency and accuracy of data-intensive inquiries and investigations.
First, let me introduce myself as today’s moderator. Next slide, please. As I mentioned, my name is Warren Kruse. I’m the Vice President of Consulting for HaystackID. In this role, I provide electronic discovery, data forensics, incident response, and consulting services to assist clients with compliance and investigations. I’m also grateful to serve as the 2022 International President of the High Technology Crime Investigative Association. Next slide.
Next, let me introduce Bobby Malhotra. Bobby’s eDiscovery counsel with Munger, Tolles & Olsen, where his practice focuses on navigating complex eDiscovery litigation issues using his unique combination of technical knowledge and legal experience. As a member of the firm’s litigation practice group, Bobby develops innovative, cost-effective, and defensible strategies for preserving, collecting, reviewing, and producing electronically stored information. He also regularly advises clients on eDiscovery preparedness, planning and execution, information privacy, and data security. Additionally, Bobby serves as a trusted eDiscovery advisor to some of the largest financial and technology sector clients. Welcome, Bobby.
Next slide. Last but certainly not least, I’m happy to introduce Bobby Kruse. Bobby is an eDiscovery and digital forensic practitioner who has worked in-house and as a consultant. He is currently part of the Global Forensic Investigation eDiscovery Group at Intel, where he acts is Intel’s subject matter expert on all things eDiscovery. Bobby’s primary focus in Intel is to protect intellectual property and trade secrets, while also providing technical support to human resources compliance, internal investigations legal, and other Intel groups seeking his and his group’s experience.
For this session, since we have the two Bobbys, we’re going to go with Bobby M. and Bobby K. so that if I just say Bobby, then either one of you could just respond.
All right, next slide. The disclaimer, today’s webcast presentation is being recorded for future on-demand viewing and a copy of the presentation will be available for all attendees once the on-demand version is completed. We expect those items to be available on the HaystackID website soon after we complete today’s live presentation, and the three of us just want to say the opinions expressed in this presentation and the following slides are solely those of the presenters, and not necessarily those of our companies. Our companies do not guarantee the accuracy, reliability, or information provided.
At this time, let’s get started. Next slide, please.
The agenda there is pretty encompassing. We’re going to have a lot of discussion. We have the question box, as Lucy mentioned before, feel free to type in any questions. If it’s something that we’re currently discussing, we’ll field it right then; if not, we’ve allotted some time at the end for other questions. We’re going to discuss data forensic investigations exploitation – this topic sounded really good when I typed it out, but I guess I should’ve tried saying it at the time. Data forensic investigation exploitation by concept searching, which will cover developing capabilities to identify individuals, improve the data recovery, identification, and analysis of captured material, and may enhance your capabilities to support various types of investigations involving data. I’ve been looking forward to this session for a long time since we have three perspectives; we have in-house, we have outside counsel, and myself as a consultant and someone who’s testified about some of these topics. Next slide.
Where does it begin? Who wants to go first? We’ll go in age order. Bobby, where does it begin? Bobby K?
So, it starts with the stakeholders, I think that what happens at the corporate level is we’ll usually have a complaint come into one of our various [PDLs], and from there, we figure out where to cast the web, how big of a web we’re going to cast, what we’re going to collect, what we’re going to process, what we’re going to send down the stack through the typical EDRM workflow. I think coming up with good date ranges is a great place to start. You don’t want to be collecting everything anymore, the data volumes are exploding, and even though the technology is running in parallel with how big the data is exploding, you still don’t want to over-collect, and that doesn’t even get started with a slew of issues that Microsoft is currently throwing out into the world with over-collecting and XYZ. So, for me, I think it’s starting with a reasonable date range, collecting that data, may that be shared Teams data, SharePoint data, website data, computers, servers, what have you, getting that all into a central repository, understanding how much you have, and then create a workflow based off the data volumes. That’s where we typically start.
And Bobby, you mentioned EDRM. We have a very large group from all over the world. I was really excited to see people from all over the world signing up and joining us. You mentioned EDRM, can you just briefly discuss that in case anyone’s not familiar with it?
Of course. So, the EDRM, or the Electronic Discovery Reference Model, is a model that’s been in the discovery workflow for a couple of decades now, and it basically starts with information governance and goes all the way through the legal cycle to productions, and actually going to court and testifying and XYZ. So, it’s a typical workflow of not only information governance, understanding what data you have, collecting it, preserving it, going through typical workflows for culling down that data, processing it, reviewing it, it’s the entire workflow for a traditional eDiscovery case.
Bobby M., as outside counsel to a lot of large corporations, do you get involved early on, later on when it becomes litigation? How do you get involved in investigations? Hypothetically, we’re not going to speak about any specific clients, but how do you get involved?
I think every matter is different. We get involved at different stages. Sometimes it’s extremely early on in the process where things are very sensitive and the company doesn’t even want employees to understand the events that are taking place that are being investigated. Other times we get involved later on in the process, once the initial investigation has been conducted internally. We’re talking about where does it began, I think it’s important to flush out a few concepts related to internal investigations, just to get everybody on the same page and to level set.
First, I think, what are they? They’re formal inquiries, usually conducted by the company itself to determine whether laws, regulations, or internal policies have been violated. Second, and this goes to the question, how do they start, so information giving rise to these internal investigations can come from a variety of sources, and I think this is very important to understand, as we get into the technology and the soup and nuts about the actual process of collecting data and analyzing data. But there’s a variety of sources. Customers, there could be catastrophic incidents that lead you into an investigation, employees could tip you off, whistleblowers, lawsuits, news, and even in today’s day and age, social media could be another source, as well as regulatory authorities. So, the information can come from a variety of sources.
And then the third concept that I think is important to flesh out is, what is the goal? The goal of any internal investigation is really to obtain an undistorted view of the facts, what happened, when did it happen, were laws or internal policies violated, who was responsible, who may have been injured or harmed? So, it’s essentially a fact-finding process to get to the bottom of things. So, I think that’s important context in terms of getting into the weeds and talking about how we’re collecting data and how we’re analyzing. The goals, where do we start, and where does this all initiate from?
Excellent, and on the slide there, something that I always find interesting is HR complaints, and we think of these investigations being in the litigation team, or if the company has an internal investigative team or something like that. I don’t know about you guys, but I get more cases, more investigations from HR than I do sometimes from legal, which is fine, but then the first thing that I do is ask if a lawyer is involved to potentially preserve privilege. A lot of times, there isn’t a lawyer involved and I’ll get a call and they’ll want assistance, but the first thing I say is, is there a lawyer involved, and Bobby M., can you explain why that might be important later on if it does go to court?
Yes, sure, and I think a lot of times, at least from my experience, there are business-level lawyers within the HR group, and they’re involved. But essentially, you want to ensure that you’re getting an attorney directly involved in the correspondence and dictating the process to be followed, so we can have the protection of attorney-client privilege, because later on down the line, you want to shield the facts, and the findings, and any legal advice that was provided throughout the process with the attorney-client privilege, so it’s important to have an attorney overseeing the process in a lot of circumstances.
Either Bobby, any other additional thoughts on initiation of investigations?
I think the last thing I’ll say on the initiation is data size and also the size of the case doesn’t really determine what tools you’re going to use sometimes. Sometimes people say, oh, this is too small of a case to run through an entire EDRM workflow. I don’t think that’s always the right answer. I think that sometimes you can use these tools for a very small case, and it’ll actually increase the speeds that you get the case complete. So, don’t think that you need to have terabytes of data to work through an entire discovery workflow, but using these tools will definitely increase your efficiencies, and we’ll get into the tools later down the slideshow, but yes, that’s what I’ll close with.
Yes, that’s a great point. The other thing that I say that’s similar, not just use the whole workflow, but sometimes we don’t get to use the cool concept tools until towards the end of the EDRM, and I’m always saying, I want to use them in the beginning, when we’re doing the investigation, when we’re trying to identify the data sources and the right people to use some of the cool tools to identify who we should be looking at. So, in addition to, you want to use the whole workflow, you want to use it early and not wait until the end after you’re almost all the way done. Next slide, please.
So, the traditional data forensic tools that I’ve grown up with, and if you haven’t noticed yet, Bobby K. and I share the same last name—
I said, I’ve literally grown up with the tools.
Yes, he literally appeared in my book when he was probably eight or 10 years old, so he grew up with this, and back then, we used to say we use tools that not necessarily were intended to do investigations, or data forensics, to do data forensic exercises, but these days, we have tons of very cool suites of tools that literally could do everything that you see on the board here, and then some. We used to have to use different tools for just looking at USB activity, just looking at file activity. Now, there’s tons of suites of tools that do a lot of this and certainly more, but I still recommend doing cross-validation, using more than one tool to cross-validate to make sure you’re getting similar results, manually review the data to make sure that it makes sense before you do a report, before you do a declaration or anything like that. Either Bobby, are there other things here that you see on a regular basis, investigation, forensics tools being used? Is there anything here that you go to all the time and you say you know what, when we’re triaging an investigation, these are the three or six things that we want to do on any triage?
Bobby, I can talk a little bit in terms of how I specifically use data forensics in practice, when does it come up as a lawyer, and I can say when I need to collect and piece together the data to tell the full story in a legal manner – for example, if I want to understand whether certain files were allegedly taken or misappropriated, such as the case where a client may suspect an employee who has been recently terminated of committing theft of IP, or I want to know when a certain file was deleted, or by whom, or where a custodian was on a given day, or whether an employee copied confidential client documents through a USB drive, those are all real life examples of where I use forensics in practice.
Now, in terms of the reporting, it could be different because we have so many data sources and trying to understand the data does become difficult, but it’s really, why are you using it? It’s often to piece together the data to tell the full story.
And I agree with what Bobby just said. The only thing I would add is we use digital forensics as kind of like the peanut butter and jelly effect, looking at both eDiscovery tools and forensics tools. We create this really nice synchronicity between the two, so we’re using all the capabilities that we have with the forensics tools to come up with the hard metadata facts, the artifact analysis; everything that you’re basically seeing on the slide right there is pretty encompassing of what we’re looking for, traditionally, if we’re not doing a very deep dive, and then coupling that with the capabilities that we have in our eDiscovery suite to then create a really nice marriage and then be able to produce everything that we’re interested in within that case.
Yes, and we keep saying the entire EDRM… a hypothetical case, recently someone thought that they were going to have to process terabytes and terabytes of data just to look at internet activity. So, sometimes there may be a specific need that a forensic tool could handle without having to go through the entire EDRM. So, I usually like to pick off the low-hanging fruit. You collect nine terabytes of data or some enormous data set, and then what can you do to maybe do it in triage, to phase out things, to say, OK, we’ll take a subset of data and process it so that we could use conceptual tools that we’re going to talk about later. But for file activity, for USB activity, there are some very good tools that you don’t have to process the whole thing. So, I just want to point that out, when we use the EDRM, we use the entire EDRM when we have to, but we could pick out certain things and do a very targeted approach as well.
Warren, I think what you just said really hits home when it comes to investigations, where you are dealing with a board down your neck, who wants findings right away, and time is of the essence, to really be able to prioritize and focus on specific pieces, particularly an investigation where you don’t have an adversary, you don’t have this standard of reasonableness or kind of justify your approach, you’re really trying to do what you can to get to the facts very quickly. Being able to prioritize I think really helps, and it helps the legal team, in terms of being able to constantly feed the beast, get data to them, get information to them, so they can start to understand what actually happened and transpired.
I think the quicker you can get at least something to your stakeholder or to your client, it definitely decreases the pressure around the entire situation. So, you can get them something to get them started, and then you can actually create your heat map right there. So, do a very small collection and get through the process very quickly; if you’re doing just an investigation, go through your computer forensic suite; if you actually are going through the entire EDRM, that’s fine. Processing small amounts of data and getting them something right away is definitely a better workflow than trying to take on everything all at the same time. So, start small, and then work your way up is something I would definitely add.
Awesome, and there was an interesting question in the posted [inaudible] that’s a little bit outside of the concept searching realm, but I’ll certainly follow up with you afterwards. And just a reminder, if you joined us late, if you have any questions, please type them in the question box. Next slide, please.
Challenges with reviewing forensic artifacts, we see this a lot. People want to get it into a review tool to have the entire team review. Gentlemen, have you run into any of these challenges or any other challenges that, again, we’re going to be talking about concept searching, that maybe concept searching tools were able to alleviate some of these things?
Yes, I mean, I think it’s inherent. One of the examples that you have there is chats, SMS/MMS. When you’re dealing with short-form messages, it is always difficult to be able to piece that together visually, particularly where chats are happening real-time over the course of many weeks or months. So, that is a difficulty, especially in the way that forensic reporting is provided today. How do you extract it? Do you analyze Excel reports? Do you do something in a review tool? So, that is a difficulty that I think we’re seeing, and a lot of providers are working for solutions to be able to better visually represent that data and also parse that data for analysis and investigations.
Yes, I think, from my perspective, having a good set of policies and procedures in place to make sure that the data isn’t tampered with. So, if you have a departing employee, making sure that their manager didn’t take their computer and say, oh, well, let me see if they’ve done anything within the last 30 days. We want to make sure that we can basically preserve the metadata to make sure that there isn’t any tampering with the artifacts moving downstream.
Yes, and as Bobby M. mentioned with the texting, and let me just go back briefly, if any of the attendees are not familiar with mobile forensics, the problem is, when we go to review the data, the different types of data are stored in different databases. So, you can have SMS messages, which are the simple ones, in a database; you can have the MMS, which someone attaches a picture or something, in a different database; and iMessage in a different database. So, just looking at a cellphone sometimes is difficult, piecing together a thread, or a conversation thread.
Now, take that a step further, when you have other mediums. Bobby M. earlier said he has challenges sometimes with investigations involving social media. So, say someone jumps on LinkedIn or some other networking site, and sends a message in that site and says, hey, here’s my email address, can you reach out to me about working together, or something like that, and then the person responds via email and says, here’s my cell phone number, I don’t want to talk about this on my work email, so then the person texts them, and now they have a text string going. We say we like to put together the email threads, but what happens when that communication thread becomes very frayed and very separated by different mediums? Some of the search tools do a really nice job pulling this all together, and we’ll discuss that later. Next slide, please.
So, this is an analogy that’s been around for a while. I can’t find a better analogy, but the center picture there, this huge haystack, there’s just tons of data, and there’s different types of data. The data sizes are growing. I used to say, I did forensics when floppy disks were still floppy, and people don’t even know what a floppy disk is anymore, so they were very small amounts of data. Now, we have terabyte hard drives, multiple terabyte hard drives, we have multiple sources. So, the eDiscovery world is very good at taking that tremendous amount of data, and putting them in nice stats and nice groups of data, and for an investigation, that can come in handy. Like you have a financial investigation, and you just want to focus on the spreadsheets at first, or you have something where you think people are sending out confidential information, you want to look at some forensic artifacts like USBs or cloud storage or something like that. So, eDiscovery, you put stuff in nice stacks; data forensics, you’re looking for that needle in the haystack, but what if you have so many needles in so many different types of types of data, gentlemen, where do you start with that, when you have so much data, you’re getting so many needles in so many different places? And I think it was Bobby M. before that said the board’s looking for answers, or Bobby K., your manager saying, where are you with this investigation, and you’re still trying to get your arms around it, where do you start?
I think you have to do a reasonable job of prioritizing the data sources, Warren. It all goes back to what are you trying to accomplish? What are the allegations? Who has complained? What are you trying to prove? And that can really help you (1) obviously identify what you need to collect and we’ll talk about collection down the line, but also help focus in on what artifacts you want to analyze and dig deeper into. If there are specific key custodians in play, or specific allegations or timeframes, or devices even. That can help you really understand, but you really need to go back to the well and think about, what are the goals, what are the complaints, what’s my objective to help better prioritize what you’re trying to do, otherwise you just have a mountain of data without actually having a plan or methodology in place to tackle it.
Obviously, that’s great advice and made me think of something. Often, when I get the call, it’s someone saying, “Hey, we’re going to send you their laptop, their work phone, their server stored email, we’re going to send you all of this data, we want you to image it and go from there”. And the thing that I say is, “What’s the end result? Are we just trying to preserve it, which is fine, we can preserve it in case it’s needed? But I like to say, “What’s the end goal, and let’s build a work plan or a priority from the end result”. So, Bobby M., thank you for that, I appreciate that.
That brings up one good point, I think, for all legal practitioners, and that’s when you engage a consultant to help you and they’re falling under the privilege, don’t be shy to tell them what your objectives are. A lot of times, I’ve noticed in practice, you send the tech folks the data, but you don’t really provide them enough guidance, the facts, what are you trying to prove, because they can really help you.
So, make sure as a lawyer, or legal professional, you’re actually getting everybody involved as a team member and providing everybody – kind of level setting, providing them with the adequate facts and really the goals of the project, so they can all be valuable members of the team.
I think from my perspective, going back to where do we start, sometimes dividing and conquering is a great place to start. So, if you’re going to be going through and doing traditional computer forensic analysis, have somebody do your forensic analysis, go through the phones, go through the computers, dig up those artifacts for you. While simultaneously you have somebody going through and processing your email data, your SharePoint data, any kind of Teams data, so you can get that ready so you can work together to then create a nice, synchronized end product that you can present very easily.
I think, for me, it’s usually nice to go about it that way, because it doesn’t put as much pressure on getting everything done at the same time. You can kind of just get your artifacts set up and get those in a nice report, get your eDiscovery set up and have that in a nice report. And the great thing is that sometimes when you find something on the analysis side, either eDiscovery or forensics, that might be the end of the case right then and there. You might not need to go through the entire process if you find what you’re looking for, dividing and conquering.
That’s the way that I like to approach it and it does work really well.
Going back to, I think, Bobby was saying where to start, when you don’t know where to start, leaning on your SMEs and leaning on counsel is a great place to go. There’s not very many cases that we haven’t run into nowadays, so at least we can kind of scope it together. So, if you have somebody from HR, compliance, maybe legal coming to you and not knowing where to start with a specific case type, rely on your experts, because there’s a good chance that they’ve run into that case before and can help you scope out the project.
So, that’s something that I would recommend as well.
Yes, and something that dawned on me while you gentlemen were talking, divide and conquer, maybe have some forensics doing some analysis, some eDiscovery tools doing some other things. Something that is nowhere in the slide deck that probably should be is project management.
I came out of law enforcement and you didn’t have project managers in law enforcement, you would do your investigations. And then I came to be a consultant and had the privilege of working with some awesome project managers. I’ll be honest, in the beginning, I was like, “No, you don’t know forensics, I’ve got this”, and then you find out how much a good project manager could help you. As you’re doing your analysis, keeping the data flowing, keeping the board or the client or the manager, whoever, up to date and allowing you to keep your head focused on the analysis, something that I just wanted to mention is good project management on top of good analysis.
I think that’s such a key point, Warren, because a good project manager also stresses cross-functional collaboration. So, if you have these various silos, you have your forensics people working with EnCase and analyzing your laptop data and then you collect it, USB drives and laptops and you’re analyzing it by eDiscovery tools. If you make a finding in the forensics world, let’s say you determine that there was a particular artifact that was plugged in during a timeframe you’re concerned about, and you have that artifact, you have that USB drive, on the eDiscovery side that knowledge really needs to flow from forensics to eDiscovery so we can start taking a look at what’s on that drive. Is it the documents that we’re concerned about? Did this employee actually take confidential documents?
If you’re doing things in a silo, I think it’s very inefficient and just having the project manager and stressing the cross-functional collaboration, I think is key, particularly in large organizations. I’m sure Bobby K. would agree to that one.
Yes, I think project managers make my life a lot easier. If I didn’t have project managers, I wouldn’t know how to get my job done. Because the amount of meetings that the project managers have just on the cases is a full-time job. So, if I had to have every single one of those meetings, I would never have time to do the actual work. So, it allows me to focus on what I’m good at, which is doing the work and then articulating my work product to either the project manager or the customer.
So, for me, it’s night and day between working with a good project manager and not having a project manager.
Definitely when you’re looking for someone to work with you, not just interview and ask for the CV of the person that might be testifying or the person that might be doing the analysis, but also check out the project managers. Make sure that you could work with them, because they could be your day-to-day liaison. As Bobby M. said, when you’re working in silos, they have visibility hopefully into all of the silos, and they could bring all of that data together. So, nothing like a good project manager. Next slide, please.
So, this could be internal or external clients, why leverage analytics? Simply, we find more. I said earlier, traditionally, we see the concept searching and the analytics towards the end of the EDRM, but it’s a heck of an investigation tool, it’s a heck of an identification tool, so consider using it sooner, splitting up that data. The forensics data that you want to do, keyword searching, a forensic keyword search. You want to look for USB activity. Get that data into the forensics tools and get the other things like email into a concept search.
Either Bobby, are there certain types of data – I already mentioned email which is kind of obvious – is there any other data that lends itself well to going right into processing and hosting and concept searching?
I think any time you’re dealing with data that’s text-heavy, you’re dealing with emails, you’re dealing with correspondence, that kind of lends itself to analytics. Quite frankly, in my experience, analytics is more comprehensive than some of the traditional keyword and manual approaches. It’s really helped me time and time again identify previously unknown search terms and concepts that I didn’t know about entering an investigation. Code names for a particular project. Industry terms of art. I’m not an expert in every industry, I use the documents to inform my decision. Analytics helps me quickly identify those terms of art and lingo that I may not have previously known about. It also helps me hone in on specific custodians. There might be key players that are highlighted by the use of analytics that we haven’t collected.
So, this can help you, I think, expand the scope of your investigation to ensure that you’re not missing the key pieces of evidence that you really need.
There are some studies that show that it’s rare for a traditional keyword search approach to get above 75 to 80% recall. What that really means is without some level of reasonable expansion of your process, you could be missing a chunk of important data in your corpus, and this is where conceptual analytics and other AI tools can really help you plug in those gaps.
So, that’s kind of my two cents about analytics, Warren.
I think from my perspective… I was actually watching 10 Things I Hate About You on Valentine’s Day on Monday and that kind of speaks to how my Valentine’s Day went. There’s a phrase that takes place in that movie and – I know you can be underwhelmed and I know you can be whelmed, but can never be whelmed, and I kind of relate that to overinclusive and underinclusive for keyword searching. There’s really no conclusive set of keyword terms that you’re going to be able to use to finish your case.
I’ve never experienced that. If someone has had the perfect set of keywords and they haven’t had to do a single piece of analysis outside of that, feel free to send me that list and I’ll apply it to my workloads. But for me, I think, conceptual searching allows for you to kind of expand from those keywords.
Take the keywords and create a conceptual phrase off of the keywords that you’re interested in finding, and a lot of times, you’re going to see some really cool results and even recommendations from these tools as to what to look for.
To Bobby’s point, there’s not always going to be a perfect set of keywords and there’s not always going to be a perfect conceptual phrase, but you can at least start creating really good phrases. If you have a specific policy violation, plugin that policy violation into that conceptual search and see what it returns. You’re going to be pretty amazed by the results that you’re going to be looking at.
As we have on the screen here, the lower half, the lighter color, you can do a forensic keyword for “bribe”. You’re probably going to get millions of false positives. No one is going to put in an email or maybe a text message, “Let’s steal the intellectual property and leave and go start our own company”, that sort of thing. But we could use the definitions of “bribe”, we could use the definitions of “theft of IP and trade secrets” to go and look for conceptually similar information, and that’s why, I think, the concept tools are so important for investigations.
Going back to, I think it was Bobby M. that said it helped with creating keywords, I’ve had cases where while the outside counsel was developing a keyword list, I would throw the email into a concept search tool, and they would come back and say, “We want you to run “Project AB””, and I’m like, “Well, they also referred to it as Project ABC”, or something like that, because the concept tool clustered them and put them all together, and I knew that they were talking about the same thing.
So, the concept tools also help develop your keywords that you could then use to search and call down that huge dataset. Next slide, please.
Bobby K, do you mind taking us through some of the values of concept searching in investigations?
Sure. I think, for me, not knowing where to start, conceptual searching is a great place to go. Just asking your customer what the outcome of the case is going to be for them, what are you looking for in those outcomes, and just taking what they’re looking for and creating a conceptual phrase off of that. And all the conceptual phrase is maybe a sentence, two sentences, maybe a paragraph, plug that into your tool, hit Go, and see what it returns. And a lot of times, you’re going to be pretty impressed by what it returns.
I think, for me, we can kind of go through these bullet points, what to avoid, what to focus on, but as far as the way that I approach conceptual searching, it’s really that next step after you apply those keywords. You can go through and identify a couple of documents that might be interesting to you through those keywords and then just start copying and pasting off those documents. Take a specific paragraph out of the document that you found that you might be interested in or might be relevant, and see if you can find anything else off of that.
The great thing about some of these tools is you don’t need to mark an entire document responsive, you can get specific to a specific sentence, paragraph, a fraction of a sentence and take that and create a conceptual search off of it. I think that’s what I rely on most is the nimble aspect of the conceptual searching tools that we have today.
We keep talking about the EDRM and how it’s this beautiful workflow. The EDRM is changing, because it’s creating more of an iterative process now. We’re not as focused on a TAR 1.0 workflow which is going from information governance all the way through to production and there’s no going back. That’s a typical workflow that we see, and it’s still applicable in certain places. But from an investigation kind of perspective, we want to be able to pivot very quickly.
So, we identify something in a specific custodian’s dataset, we want to be able to go back, collect more data potentially, and then send it through the process again.
I think that’s from my perspective how I apply conceptual searching. And if you guys have any questions based off of the mismatching issues and the other bullet points, feel free to post it in the Question box. But that’s pretty much what I have to say about conceptual searching.
Thank you for reminding the folks about the Question box. And considering we have a group from all over the world, the one thing that I run into and I’m sure you gentlemen do as well with your worldwide clients, comingling of languages. And sometimes even in a subject line, especially in Asia sometimes you’ll get some English just in the subject line. Obviously, we have entire documents that are in foreign languages, you could segregate those out and have people that speak the language. But sometimes I’ve seen a Chinese character and then English just in the subject line.
Bobby M., do you run into that? Do the conceptual tools help you with that?
I do. I want to tackle this in a slightly different way. I do run into that. I think these conceptual tools are very powerful, because they let you know early on what languages you’re dealing with in your dataset, so you can apply the proper workflow to the extent that those documents are important to your investigation. So, if you want to do translation, if you want to do machine translation combined with some level of human review in the local language, you can kind of develop a flexible approach and then tee you off early on in the process what you’re dealing with.
Since we’re dealing with such a diverse group here of attendees, I think we’re behooved to talk about the elephant in the room. So, there’s many jurisdictions, such as the EU, that have civil, and in some cases, criminal laws which restrict the transfer of data outside of their borders, or they provide very strict rules for the transfer of personal data.
So, if a data source is requiring collection (emanate) from non-US-based custodians, I do think as a starting point it’s important to do an analysis of the relevant cross-border and data privacy issues to really help guide next steps.
So, in some cases, where you’re dealing with strict data protection laws and you may not have the appropriate exceptions under the laws or those exceptions haven’t been defined, you may have to think about contingency plans, such as engaging maybe the eDiscovery provider that has a local data center, so you can have the investigation, or the investigation can take place without any data moving across borders.
But tied to your foreign language issue, yes I agree that it’s very important, it kind of tees you off earlier. But one of the things you’ll want to think about, especially in today’s global economy, are you dealing with any major cross-border or data privacy issues that you need to take into account before even conducting the investigation.
That’s a great segue, if we could go to the next slide, because it sounds like this is where you would form a strategy, Bobby M.
That’s exactly right. And I’ll let Bobby K. chime in here too, but what I said earlier, the allegations, you want to evaluate the complaints, the allegations, the goals of the project, why are you doing this, assess the nature of the allegations. That can really hone you in and focus you on which custodians do you want, and which data sources do you want. So, are you dealing with, for this particular investigation, only a set of executives? Are you dealing with business line folks that are in the trenches of the discussions? Is it some combination? Are there special considerations about data sources? Are you collecting mobile data or local data, or is it just focused on email data?
So, I think all of these considerations should happen, but it’s never losing sight of what your goals are, what the allegations are, what you’re trying to understand.
And then one thing that can also impact early on what you’re going to collect and what you’re going to do in the scope of your analysis is who is the audience for your eventual findings? When you’re done with the investigation and you come out with your findings, because your investigation might be a lot different if you’re dealing with an audit committee versus the board versus the CEO versus a regulator. It kind of depends on who your target audience is.
So, think about all of those things in advance. In addition, you kind of lured onto this, the issue with EU and data protection laws, these are all things to think about in that early phase.
Yes, I think stepping through the process. So, we start with allegations, preservation, collection, review, really having good legal hold in place to making sure that you’re not getting rid of data that’s going to be potentially part of that investigation is really important. Making sure that you have good policies to then go ahead and preserve that data, and then collecting it.
I think for most of the cases that we see come through, there’s a lot of repeat offenders, I guess we could say, and having those people readily available and already sometimes preprocessed will create a lot of downstream efficiencies.
For me, I think that understanding who is going to be part of this case, if you already have them on email hold or legal hold or whatever kinds of structures, and policies, and procedures you have, that’s great. And then don’t recreate the wheel every single time. If you have some typical cases that you go through and you’re seeing them pop up more and more frequently, create a typical workflow for those cases. There’s always an inherited time issue that pops up. And what I mean by that is – to Bobby’s point about the audience – if you’re going to be going and doing a DoJ production, that’s a lot more work than if you’re just presenting to an internal investigator. Printing off a couple of PDFs for them to review versus going through Bates stamping in the entire production and rolling production workflow, that’s a very large and very different time investment. Bobby Malhotra, he can speak to this until the cows go home, but it also requires a lot more people sometimes.
So, scoping the case and understanding the outcomes and then the audience, I definitely will echo what Bobby said there.
For the folks from outside the US, the DoJ is the US Department of Justice.
Just one more quick comment about the legal hold. So, I think your duty to preserve, or when your duty to preserve attaches is really a fact-specific query, when you have a reasonable anticipation of a litigation or regulatory matter on the horizon. So, I don’t want to say that every single investigation kind of ties to that duty to preserve and you should initiate a litigation hold. It’s a very fact specific inquiry and there’s a lot of case law on that. So, I just wanted to make that point really quick, Warren.
The next slide, please, is something that I was looking forward to talking to these two gentlemen. We’ve spoken a lot about who the stakeholders are, but it takes some buy-in for all the stakeholders. Is that not true, Bobby M.?
I get this question all the time. [First}, I feel that the legal community is still not using AI as much as it really should be, and people are always asking me at conferences and other events, how do you get stakeholders on board with AI?
My personal approach, I think it’s been a multistep approach to get key stakeholders on board, and it really varies. One thing just to set out there, it really varies and it depends on who I’m dealing with. So, my approach or my pitch might be different if I’m talking to a client representative versus if I’m talking to a senior partner at my own firm. But overall, what I like to do is work on budget and cost projections.
If you have two separate budgets side by side, one for a traditional keyword manual type process and another one where you have AI workflows in place, and the cost savings for that AI approach are substantial, and they usually are, quite honestly, that tends to resonate very well with certain stakeholders, Warren, particularly client representatives. And particularly in investigation matters where you don’t have an adversary poking holes into your process, or you don’t have certain standards that you have to adhere to like you do in litigation, and you’re just trying to get to the facts quickly. So, that whole budget projection works very well when dealing with client representatives.
Second, I like highlighting the defensibility of the approach, particularly the QC steps that we’re going to perform at each stage in the process to really validate the results of the AI workflows, and really show people we’re just not blindly relying on AI inputs. The AI is really just helping us and supplementing our approach, as opposed to just blindly relying on what those outputs are.
And finally, what I like doing, my third approach in this kind of stakeholder buy-in is definitely highlighting the flexibility that comes with an AI approach. And what I mean by that is you can select the right workflow and level of automation depending on the needs of the case, and the client’s risk appetite. It doesn’t have to be an all-in or nothing, you have to use every tool in your tool belt, or automate every single workflow in your process, you have flexibility there to do what you want, depending on the needs of the case.
And one way to really see all this in action, in my opinion, is to run through a proof of concept using these AI workflows and reporting back to the key stakeholders the key findings, such as the speed that you were able to achieve, how quickly you were able to get that information to the legal team, the efficiency, the time savings, ease of use.
In the end, I think the proof, Warren, is always in the pudding. And there’s just to substitute for actually running a POC and kind of seeing how AI works in a live matter.
Bobby K, you’re a big fan of POCs, right?
I’ve done my fair share of POCs, and I think, for me, the easiest way to show the value of a tool is to take an investigation that you’ve completed in the past and run it through the new process. That’s kind of my favorite way to go through a POC to really show the efficiency.
So, just like Bobby said, show the side-by-side efficiencies, not only from a cost perspective, but also time invested and how quickly you can get through the process. So, if I’m just doing traditional keyword searching, forensic analysis, and then doing a report, do I really find the smoking hot documents as quickly as I can by doing clustering, conceptual searching, communication analysis, relying on analytics to help me do my job. I think that’s where you can really start seeing buy-in.
And also, the experts that are actually using these tools, do they enjoy them? Do they enjoy using the tools? Is it making their lives easier? Is it making it more complex? Is there a steep learning curve? Are these tools intuitive? Make sure that the people that are going to be running the tools, you also get their feedback as well, because that’s going to be pretty crucial, because they’re going to be in the trenches every single day going through the data, running the conceptual searches, helping their stakeholders finish the investigations. That’s where, I think, at least I’ve seen the most buy-in from stakeholders is, like Bobby said, the proof is in the pudding. So, prove to me that you can do this investigation more efficiently, quicker, and with less manpower than you could with the previous set of tools that we had for – no longer have, hopefully.
I can’t believe how fast this hour has gone.
Gentlemen, I appreciate your time. I want to thank the entire team that put this together. We want to thank you all who took the time out of your schedule to attend today’s webcast. We know how valuable your time is and we appreciate you sharing it with us today.
We hope you have an opportunity to attend our next monthly webcast scheduled for March 16th, and thanks again for attending and have a great day.
This will conclude today’s webcast.
HaystackID is a specialized eDiscovery services firm that helps corporations and law firms securely find, understand, and learn from data when facing complex, data-intensive investigations and litigation. HaystackID mobilizes industry-leading cyber discovery services, enterprise solutions, and legal discovery offerings to serve more than 500 of the world’s leading corporations and law firms in North America and Europe. Serving nearly half of the Fortune 100, HaystackID is an alternative cyber and legal services provider that combines expertise and technical excellence with a culture of white-glove customer service. In addition to consistently being ranked by Chambers USA, the company was recently named a worldwide leader in eDiscovery Services by IDC MarketScape and a representative vendor in the 2021 Gartner Market Guide for E-Discovery Solutions. Further, HaystackID has achieved SOC 2 Type II attestation in the five trust service areas of security, availability, processing integrity, confidentiality, and privacy. For more information about its suite of services, including programs and solutions for unique legal enterprise needs, go to HaystackID.com.