Editor’s Note: On October 14, 2020, HaystackID shared an educational webcast designed to inform and update legal and data discovery professionals on the complexities and challenges of eDiscovery support in antitrust agency Second Request matters. While the full recorded presentation is available for on-demand viewing via the HaystackID website, provided below is a transcript of the presentation as well as a PDF version of the accompanying slides for your review and use.
The Dynamics of Antitrust Agency Investigations: A Second Request Update
HSR Act-driven Second Request responses require an uncommon balance of understanding, expertise, and experience to successfully deliver certified compliant responses. Recent FTC and DOJ updates, coupled with the increasing velocity of requests, make current expertise more crucial than ever in this unique discovery area.
In this presentation, expert investigation, eDiscovery, and M&A panelists will present updates and considerations for managing Second Request responses to include tactics, techniques, and lessons learned from fourteen recent responses.
+ Defining Second Requests: The Requirement and Task
+ Context for Consideration: The Prevalence of Requests Over Time
+ A Different Type of Discovery: Characteristics of Second Requests
+ Recent DOJ and FTC Updates: From the Practical to Tactical
+ Managing Second Requests: A Provider’s Perspective
+ Michael Sarlo, EnCE, CBE, CCLO, RCA, CCPA – Michael is a Partner and Senior EVP of eDiscovery and Digital Forensics for HaystackID.
+ Mike Quartararo – Mike currently serves as the President of ACEDS, which provides training and certification in eDiscovery and related disciplines to law firms, corporate legal, and the broader legal community.
+ John Wilson, ACE, AME, CBE – As CISO and President of Forensics at HaystackID, John is a certified forensic examiner, licensed private investigator, and IT veteran with more than two decades of experience.
+ Anya Korolyov, Esq. – As Director of Project Management with HaystackID, Anya has 12 years of experience in eDiscovery with extensive expertise with Second Requests as an attorney and consultant.
+ Seth Curt Schechtman, Esq. – As Senior Managing Director of Review Services for HaystackID, Seth has extensive review experience, including class actions, MDLs, and Second Requests.
+ Young Yu – As Director of Client Service with HaystackID, Young is the primary strategic and operational advisor to clients in eDiscovery matters.
Good day. I hope you’re doing well and having a great week. My name is Rob Robinson. On behalf of the entire team at HaystackID, I’d like to thank you for attending today’s webcast titled “The Dynamics of Antitrust Agency Investigations: A Second Request Update.” Today’s webcast, part of HaystackID’s monthly series of educational presentations, is kindly hosted by the Association of Certified eDiscovery Specialists today, better known as ACEDS. As part of that hosting, we are fortunate to have ACEDS’ President, Mike Quartararo participating as our host for today’s expert webcast. Mike is an eDiscovery expert, author, and speaker, who in his leadership with ACEDS provides training and certification in eDiscovery and related disciplines to law firms, corporate legal, and the broader legal community. Prior to his leadership role of ACEDS, Mike served the industry in roles at Skadden Arps Slate Meagher & Flom, and also Stroock & Stroock & Lavan. A PNP certified professional, Mike also authored the 2016 book, Project Management in eDiscovery.
At this time, I’d like to offer a warm welcome to Mike and turn the webcast over to him for his comments and introduction of today’s expert presenters. Hello, Mike, and thank you for hosting today’s webcast.
Hi, Rob and hi, everyone. Thank you so much, and thank you to HaystackID for sponsoring and presenting today. Welcome, everyone to today’s webcast that as you’ll see shortly, is really going to dig in on eDiscovery and Second Request. As many of you know, Hart-Scott-Rodino Act-driven Second Request responses require an uncommon balance of understanding the law, expertise in corporate M&A transactions, and experience in eDiscovery to successfully deliver certified compliant responses. Basically, the HSR Act and other laws require that parties to large corporate mergers or acquisitions prepare regulatory filings before a merger occurs. A Second Request is a discovery procedure in which the Federal Trade Commission and the Antitrust Division of the Department of Justice investigate mergers and acquisitions to ensure they’re compliant with, among other things, anti-competition laws. This often leads, quite frankly, to some of the most massive and fast-moving eDiscovery projects in the legal industry. And it’s funny that I’m participating today. This presentation brings my career full circle, Rob, because very early in my career, I worked on big antitrust matters, like the ones we’re going to talk about today.
Recent FTC and DoJ updates, coupled with the increasing velocity of requests, make current expertise more crucial than ever in this unique area of discovery. In today’s webcast, I’m excited to be part of a team that will share with you information, insights, and examples that may help you better understand and execute on this unique type of discovery project.
Our presenters today include five of the industry’s foremost experts and authorities in eDiscovery, all with extensive experience supporting antitrust agency Second Request. First up, we’ve got Mike Sarlo. Mike is a partner and senior executive vice president of eDiscovery and digital forensics at HaystackID. In this role, Mike facilitates all operations related to electronic discovery, digital forensics, and litigation strategy both in the US and abroad.
Next, our second presenter is John Wilson. As Chief Information Security Officer and President of Forensics at HaystackID, John is a certified forensic examiner, licensed private investigator, and information technology veteran with more than two decades of experience working with the US government in both private and public companies.
Our third presenter is Anya Korolyov – sorry, Anya. I practiced that, but I still didn’t get it right. As a trained lawyer and Director of Project Management with HaystackID, Anya has 12 years of experience in eDiscovery with extensive expertise in Second Requests as an attorney and a consultant.
Also, I’d like to introduce Seth Schechtman, a senior Managing Director of Review Services for HaystackID. Seth has extensive review experience, including class actions, MDL, and Second Requests.
And finally, our last presenter and expert is Young Hu, a Director of Client Service with HaystackID. Young is the primary strategic and operational advisor to clients in eDiscovery matters.
Today’s presentation will be recorded for future viewing, and a copy of the presentation materials are available for all attendees. You can access these materials directly beneath the presentation window on your screen by selecting the attachments tab on the far left of the toolbar beneath the viewing window.
At this time, I’d like to turn it over to Mike and kick off today’s presentation. Mike, please take it away.
Thanks so much, Rob and Michael. Again, I want to reiterate how excited we are to be partnering with ACEDS for this educational webcast on Second Requests, and most on the phone from my team here, they live and breathe it, and we’re happy to answer your questions, and we’re really thrilled to be able to go over our approach to these, and just as far as an agenda is concerned today, I think we’re going to be a little bit fluid, but we’re going to try to just quickly give you an overview of how ta 2R arises.
We’re going to go through just some of the characteristics of what makes these different than typical civil litigation or investigation; some strategies for managing these matters, and then also, how that integrated approach to manage review with eDiscovery is critical to get these projects to function fluidly and successfully.
So, back in time, 1976, we had the Hart-Scott-Rodino Antitrust Improvements Act, and what this was designed to do was basically give the federal government a mechanism to investigate mergers and acquisitions in a format where there was an approval process for those and a framework for that. Really the goal here was to prevent what was called midnight deals from happening and deals that were really just happening that nobody knew about, and there wasn’t a chance to investigate them. So, basically, this act allows an organization to… if they’re going to do a deal, a merger, somebody’s buying somebody, they’re going to have to file an HSR filing, and when this is filed, there’s a 30-day window, when that 30-day window, where the deal goes under review by the FTC and/or the DoJ, and at the end of that, the deal either goes through or the regulators might come back and say that they want to get a second batch of information, which is called a Second Request commonly. The big thing here is very accelerated timelines. These 2R matters can cost quite a bit of money, and not a lot of matters go to a Second Request from post-deal review.
Typically speaking, it’s industries where there’s a spotlight on competition issues, or when two very large companies are coming together, and they have concerns about being able to snuff out competition, and other more antitrust aspects that just come when you create a single market environment. There are not actually a ton of Second Requests that go on. In 2018 alone, we had about 2,100 or so HSR filings in the United States, and of that, only 2.9% of them actually went to a Second Request in 2017, and in 2018, it was 2.2. So, not a ton there. You’re below 50, and there’s only a few, I think, providers and law firms who do these over and over again for that reason. They require perfect execution, so to speak.
Planning is mission-critical for these. Typically speaking, I think, where that there might be some misconception in the eDiscovery community is that these are like a black swan event. They’re not usually and if they are, that’s a problem because typically speaking you’re going to be working with a law firm that’s going to have an M&A group that… when it comes time to doing HSR filings, right now they’ve gotten to the end of a deal, it’s negotiated, everybody knows what the terms are going to be. Now, they want to seal the deal and that’s the HSR as really the first step in sealing the deal. This is usually going to go to the antitrust group at a law firm, or if they’re integrated in the… oftentimes, these two groups, M&A and antitrust, are working together closely, they’re going to look at the HSR, they’re going to look at the deal, and they’re going to say, hey, we think you’re going to get a Second Request.
Now, this is the difference right here, between good enterprise counsel and smaller shops that may not have a lot of experience with these. So, as an organization, it’s important to make sure you’re working with the right law firms here as well, and then also, what’s the right vendors, because there’s a lot on the line if this does not go well, if you do get a 2R, and it’s very important that you have a plan, because without a plan, the amount of tasks to be completed, and goalposts to be jumped, to get to what’s called substantial compliance can become insurmountable. It’s not something you can handle learning for the first time, and these have a different type of characteristics than typical civil litigation, or even a very caustic regulatory investigation, where you have a regulator breathing down your neck. A 2R is usually always going to be a lot happening at once, and when I say that, it’s so important to really understand that, that you’re doing every phase of the EDRM at the same time, post-collection, and that’s really just your first initial on my collection. Once that first collection’s done, you are collecting and processing and reviewing and doing everything the same day, and it’s really important to you to get a solid understanding of how you’re going to use advanced technology, what are the issues that are going to be ahead of you dealing with an organization, their data types, and their custodians, and how they do business, as you work to get this to substantial compliance. In today’s modern age with so many different data types, go way beyond email and network shares, we’re in the cloud everywhere, Slack, cell phones, you’re wearing data on your wrist, it’s a data world, and regulators are very interested in data outside of email, and if you’re using it in the standard course of business, these requests are so broad when it comes to Second Requests that the standard of responsiveness is much higher and much more elevated than a typical matter. Being able to handle multiple languages, understand multiple marketplaces, sometimes these deals, they’re happening on a global scale, they’re being reviewed. Obviously, the US and EU are different and different processes, but there are even situations where the deal might be reviewed and approved in the US, and then the organization goes to get approval in the EU and they turn it down, or vice versa. It’s definitely happened.
And I think what’s interesting about these, and there’s so much work to do, there’s a standard of substantial compliance, and what this really means is you’re never actually going to be able to say that you did everything perfectly with these. It’s an effort and rigor against the clock type self-certification, where as an organization, you’re going to make a call to say I’ve complied with this request, and there’s certain things that within the nature of the request that’s asked for that would just never be reasonable to be accomplished in 30 to 60 days, but that best effort is what’s critical, and being able to articulate also what that best effort is, both from a technology standpoint, why something’s possible or not possible, and from a legal standpoint, why it’s necessary or not necessary to comply with these.
I’ll kick this off to my colleague, John Wilson, who is going to talk a little bit about the first step here, which is getting a hold of all the disparate data and data types and locations.
Thank you, Mike. So, when you’re in these Second Requests, and you’re starting to figure things out, you’ve got data that’s going to be across a wide spectrum in the organization, and they’re in lots of disparate systems and you’ve really got to get doing that data map, getting everything identified as quickly as possible so that you can start planning how you’re going to get into those different data types, because the some of them may be cloud-hosted and you may have throttling and constraints around how fast you can download the content, or it may be like Slack data that comes in JSON files, and it requires a lot of parsing in order to actually get it converted into a usable medium. So, it really becomes important to make sure that you’re getting into the data map as quickly as possible. A lot of times when you’re talking about Second Requests, you have to do a good amount of overcollection, because you don’t have time to take that deep dive and be like, oh, here’s the three folders with the documents, you’ve got to move quick, you’ve got to be hitting it and you’ve got to get into the data sets so that you can get them into processing and get things moving. So, a lot of times you have to be OK, we’re just going to grab the parent folder, I’m not going to dig through all the folders underneath there to figure out which ones of the folders are specifically responsive at the moment. We can do that during the processing or utilizing the TAR and the various other elements that we’ll get into later in this presentation. But you’ve got to keep in mind, you’ve got to move quick, and you’ve got to get into it, and you can’t delve too deeply into the details.
And then you also really have to be prepared, as you’re moving into it, things are going to get missed, and then you have to be prepared that in the third week of the collections, all of a sudden, hey, here’s five more custodians that you need to get. We just realized that they were in the talks and related to the deal, or were related to the subsidiary or the various different connections that can occur, so you really have to be prepared and expecting and anticipating the additional custodians, additional data sources that may come up at a later time as well.
So, I think we’re bringing in the next slide.
Historically, a need for advanced technologies where you need to have a good understanding of what advanced tools you’re going to use. From a review and production standpoint, that’s typically going to be a review platform like Relativity, and any of the different types of analytics tools that are available natively in Relativity, or as third-party integrations such as Brainspace, and really there is a lot to consider from an eDiscovery processing standpoint that Young Hu is going to talk about with you guys, and Anya, as far as just getting prepared with these tools, and how you’re establishing workflows early on in a collection to get complex data types and just data types en masse through the eDiscovery pipe.
Right, and my thanks for handling that off. So, when you’re workflowing these, the Second Requests, you always want to select a technology solution that lends itself to an iterative workflow as well as a concurrent workflow. Typically, we do like to use Nuix and Brainspace just for throughput, because speed is of the essence when you’re dealing with the Second Request here. Also, you want to select applications that allow for early identification of problematic or issue prone buckets or areas here. Anya, do you want to speak a little about multiple languages, language detection, and what we do there?
Thank you. Yes, the language identification is pretty crucial here due to the DoJ spec requirements which ask that the foreign language documents are translated and provided in a separate production volume when meeting the substantial compliance, and because we’re working with such tight deadlines, we really want to start analyzing the data as soon as it hits the review platform. So, we want to be proactive. Before we even decide search terms or linear review or TAR, we want to start analyzing the data as soon as we have it, and we want to identify other languages and documents, and either get those translated as soon as possible, or see how many foreign language documents we’re actually dealing with so that we can make a decision whether we need to bring managed review to help us get through these foreign language documents.
Now, if we’re dealing with a small number of documents, internally at HaystackID, we use Authenticity.AI, and what the program does is it does translations on the fly, as well as bulk translations, and it does a very good job and allows us to add additional documents and get to the point where we need to be at the time of production. Of course, if we’re dealing with a large number of foreign language documents, if the companies have any custodians that are primarily in a different language, we will communicate with the managed review, and bring them in, so that we can analyze how many we’re dealing with, how many reviewers we might need to get, and so we usually reach out to Seth, and I’ll let him talk about staffing for the foreign language.
Thanks, Anya. Thanks, Young. Hi, everyone, my name is Seth, I run the Review division at HaystackID, and as Anya and Young said, given the Second Request, a lot of them involving multinational companies, you’re always going to see foreign language documents. How many will determine whether it’ll be machine translation, or you’ll need human reviewers to go through them. We do review it at HaystackID a little bit differently than the industry. Before reviewers come into our ecosystem, they’re given a test. We give them a protocol, we present them with sample documents, and then we present questions [break] those documents from privileged to issue tagging, and we’ll see how they perform. We grade them on their speed and accuracy. We do that in English, we also do that in multiple foreign languages, and the reason why we’re doing it is not only ensuring that they’re having the language proficiency, but they have the skills necessary and acumen to review documents in those foreign languages, thus ensuring that they are above standard when it may be difficult for the higher-level managers on the case to bring in those foreign languages.
I’m going to throw it back to Mike and John to talk through practical aspects of managing these requests from both preparation, planning, and execution.
Thanks, Seth, really appreciate it. So, managing these requests, really, again, you’ve got to have a good plan, and preparing is critical too, because I’m a huge advocate for more cross-functional teams. I’ve been with HaystackID, just three guys in the basement, we’re a pretty large company these days, as far as the vendor is concerned, and I know the way that vendors try to organize and you organize around clients and matters, and as you get bigger oftentimes, things become segmented, and you have a lot of different business units supporting the eDiscovery process.
Second Requests require a really integrated approach, both from a preparation and an execution standpoint, and I say that because just putting a forensics person on a call to scope something with a client who may not understand the end-to-end eDiscovery and review process that can be problematic for your overall project goals, where early on intervention into data sources, and really rendering of data sources, and we’re thinking alternative data types, Slack, things like that, which [break] all the time, in every one of these, and we do a lot of them. These are where we run into issues, or you might run into issues. We don’t run into issues, but something that everybody should just be aware of is a point of collection you need to be thinking about on production, and oftentimes these matters start out with you’re anticipating a Second Request, you’re already working to understand maybe who the custodians are going to be, and these typically goes in tranches of fives, is what the DoJ and FTC will agree to, usually no less than 10, upwards to 25, 50. That would be a massive one. Just based on the amount of data sources you need to look at for each custodian, and it’s important to develop your data map around the custodians you know that are definitely going to be involved, and these are going to be executives, you’re going to have the CEOs involved, the whole C-suite, the finance people, the really key players in an organization who have access to everything, which means all the confidential information, things you may not want to come out to a regulator that have nothing to do with Second Request, and the inner workings of your organization. These people also tend to have access to a lot more data than they probably use and that’s a really important part of the data mapping process is to really rank repositories and data sources at a custodial level based on their likelihood of being responsive to an incoming request, and to kind of go down the list there, so that you’re able to effectively negotiate with a regulator as far as scope of the request. Technical negotiation is so important for these. You cannot divorce the lawyering from the technical aspects of it, as far as looking at an organization, the technical challenge here is so extreme.
And, really, it’s important, as us, if we have a 2R in-house or several in-house, we have forensics, resources on standby 24/7 and establishing an end-to-end workflow with the end client is critical. You need to prepare the organization as well, and that’s something that I always like to say is that a lot of organizations, they could be very litigious, they could deal with eDiscovery in-house, they could do a lot, but if they haven’t had a Second Request, they have no idea what they’re in for, and it’s important to really keep pushing the eDiscovery provider and outside counsel to make sure you have resources co-opted into their response at every level, because I can assure you if the deal doesn’t go through because you weren’t able to comply, you will be hearing from the CEO – you will probably get fired. So, it’s really important to get everybody aligned inside of an organization to support these, because there’s a lot of red tape that oftentimes needs to be broken. Imagine having an outside third party going into a tech company asking to pull out 30% of their network shares, the sea of approvals and people who have questions, these are the types of conversations that need to be handled at the executive level in the organization and you need a strong IT leader and legal and everybody working towards the same goal to make things happen quickly.
Really, again, with the negotiation of more complex data types, there’s an expectation that these are burdensome, and John, you can talk a little bit about why these are burdensome, especially with mobile phones and Slack and throughput of downloads from Box.
So, you have all these disparate sources that you’ve got to bring together and all of them have different levels of technical challenges or hurdles. Some of the newer ones are very difficult to even export the information. It takes pretty heroic efforts to accomplish that. So, understanding all of that as you’re moving into it and have a plan around how you’re going to get those done and what that timing needs to look like.
Just really before I hand it off to Anya as well and Seth, keep in mind that this a collaboration where it’s all about cooperation, usually forced cooperation, but everybody is trying to get the regulators exactly what they ask for in reason. This also creates a dialogue where they’re going to ask what technology you’re using. They get that granular. They’re going to want to, actually, sometimes dial in and look at a random sample of documents of post-technology assisted review, maybe even multiple times. It’s why it’s important to pick a provider that’s really vetted with the DoJ and FTC, and also to pick a technology stack that… speaking DoJ long-term standing history to Nuis and Brainspace, the magic bullet combo for TAR 1.0 where they heavily understand it, and that’s something that everybody needs to be aware of too. As data moves and decisions and more product comes out of the hands of the technical people, it always goes back to lawyers, and some lawyers are more technical than others and it’s very important to be able to give the lawyers what they need to frame the process.
From a provider standpoint, that’s going to be a very integrated approach to review and eDiscovery support and production, which Anya and Seth will talk about some of the workflows and why those are important.
Thanks, Mike. Again, as Mike said, and we keep pointing out, but this is one of the most important things in a Second Request, of course, is timing. Because timing is so short, it’s very, very important that we communicate with the legal team, that we communicate on the vendor side, all the departments that are involved in the workflow of the Second Request are in constant communication that we have daily calls, that we all know exactly where we are on the collection and the processing. I’m getting the data into the review and getting the managed review, most importantly, setup so that they know what’s coming to them and they’re able to staff and be ready for the avalanche of documents that’s about to hit them.
So, to start with, of course, we will always have the low hanging fruit, we will always have paper documents, documents that have been identified by the company as definitely responsive, that don’t even need to be reviewed that we can get through as soon as we have them and we can get those into production workflow and get going on that.
Of course, our next consideration is what kind of company are we dealing with? what kind of data are we looking at? Is this a healthcare-related company, potentially, and are we looking at a ton of PII staring us in the face that we will have to take care of? That’s an important consideration for us.
Talking with the law firm and understanding what they have, where they’re thinking, and what they potentially will get into an agreement with the regulators on. Are we going to be using keyword searching to narrow down the data? Are we going to be using technology-assisted review, email threading, what kind of deduplication will the regulators agree to? So, all of those things are very important to understand at the very beginning, and kind of keep in mind and revisit, and stay on top of so that we can all work together to get this to substantial compliance.
Just here really quickly is a slide that we put together based on our experience, how we go down from the data that is collected to calling it down to the stages of processing, deduplication, analytics and assisted review, which of course we always would like to use in the Second Request just because of the timing and to phase three, which is managed review process when Seth and his team would take over and we can set them up the best that we can.
I also want to move on and talk about a few special considerations and have Young help me out here with some of these. A lot of them are dependent on, again, on the type of Second Request we’re working on, the type of company, but there is also more [break] special considerations that we need to keep in mind.
Sure, Anya. In terms of getting ahead and special considerations, you need to consider during the course of a Second Request are the specifications. There have been some drastic changes over the last two years, 2019-2020, so both to DoJ and FTC, and the changes to the specifications will definitely impact, not only, processing but the review and production.
A quick example is the treatment of embedded objects. In the most current versions by both the FTC and DoJ, embedded objects don’t need to be extracted, unless there are audio/video files for documents that you are producing in native format. There are some caveats there. when you’re not extracting embedded objects, that embedded object could have text that hits a privileged term. If you’re not factoring them into your review, you could potentially miss a privileged document that was embedded within, I will say, a PowerPoint or a PDF or whatever that may be. So, these are all things that you need to consider as you’re working through that workflow and it’s also one of the underlying factors why the choice in the software application for processing is so important and also the analytics engine how you factor these documents into the review, it impacts the [break] substantially.
Other changes to the specifications include the handling of password-protected documents, [spec] now requires that they be unencrypted and produced as unencrypted files as opposed to just providing passwords. [Inaudible] files are another addition where they need to be produced in native format as opposed to being broken out, and I think the largest change in the specs is probably going to reporting. The reporting obligations have changed considerably and it’s not only within the specifications but also in the actual request.
Anya, do you want to speak on considerations for the analytics model here.
Yes, thank you, Jan. Again, just going to keep hammering this point. We have normally 30 days to collect, process, get into the review, review everything, produce, and deliver to substantially comply. So, linear review in this case it just kind of goes out the window unless we’re dealing with one or two custodians and in the Second Request, that’s just never going to be the case.
So, we’re always going to turn to some sort of assistance with either structured analytics threading, if the regulators will agree to that, or more often than not, it’s conceptual analytics. With conceptual analytics we, of course, have the two options of TAR 1 “Predictive Coding” and TAR 2 “Continuous Active Learning”.
With TAR 1, we have a subject matter expert usually from the law firm, one or two people that will review a controlled set of documents and then one for the training round. The training round will get measured against the controlled round, and normally we go through somewhere between five to maybe, sometimes up to 15 training rounds. However, at the end of that, we have a set of documents that is responsive, and we have a set of documents that is considered not responsive. At that point, we can concentrate on reviewing for privilege rather than reviewing for responsiveness.
With TAR 2, the Continuous Active Learning, the process is just a little bit different and we would have a subject matter expert review feed documents, and then we would feed that into the model, and at that point, bring the review team. The model would then push, what the model considers the responsive documents, to the review team and as they’re reviewing, the model would continue to learn.
Now, this sounds favorable when we have rolling collections and we have potential new custodians, however, because we only have 30 days and because the time, usually, within the 30 days that’s allotted to the review is somewhere closer to two weeks, he TAR 1 model is still going to be the preferred method. So, if we can get the regulators to agree and they have been agreeing to the TAR 1 pretty openly, so we’re better off, because the training will take one or two days most, and we will have a response that we can stop worrying about, we don’t need to lay our eyes one very single responsive document, because what we’re trying to achieve is substantial compliance. So, once we get through that process, what usually happens is we grab a random pool from the not responsive population, and we provide five different samples to the DoJ that they can go in and they can look at these documents and they usually don’t look at all five, they randomly pick one or two. They will go for those documents and they will review them, so that they’re satisfied that our model was correct, and we trained it correctly and we identified the responsive documents. They will sign off before we close out the model.
If we have new custodians, if we have rolling collections, if we find ourselves in a refresh situation, we can always reopen the model and then, at that point, the decision is really with the regulators and law firm to agree whether we’re just opening the model for the new data that came in, or if anything gets shifted that then becomes responsive now, we produce that as well.
That’s the TAR 1/TAR 2 consideration. We, of course, have the ability to use either Relativity or Brainspace, our preference is for Brainspace, because Brainspace, we feel, provides more options to us, especially when we’re dealing with really large datasets, so it has more options for us as far as the types of training rounds that we’re able to grab more documents and train as many documents as we can from our whole dataset.
So, once we did make a decision, once the regulators agree to use either TAR 1 or TAR 2 and we move on, we’re left with documents that did not make into TAR. We have our Excels, we have our JPEGs, we have our audio/video files, and we have non-standard data types, mobile data, Slack data. What do we do there? how do we treat that? Of course, the Excels, the JPEGs, the audio/video files will go into the linear review, and there are several ways that we work together with the managed review to get through that data as fast as possible. We also, of course, have to take into consideration and make sure we unencrypt everything that has password-protected files, so that the review team can get through those as well. And then we’re coming down to mobile and Slack data, if it’s within our collection, and then we have to make the decision, what are we going to do there and how we’re going to meet the challenges with that data, due to its nature. The chat messages, they’re usually one-liners. Slack is usually a one-liner, or we have problems where we have usernames. How do we deal with those challenges? How do we overcome that? And is there a way that we can put any of this data into TAR.
I’m going to throw it back to John and Mike to kind of talk about those datasets.
So, before we even talk about TAR, you need to think about how you’re handling this type of data and thinking about it and having a conversation about its use and function. What it often ends up becoming, as we start getting alternative data types, short message comms in this case, chats, text messages, that’s what we’re talking about here really, is the frequency at which they will be combined. Really, what that means, let’s think of a Slack.
In Slack, we have private channels, we have public channels, we have direct messages and a lot of other channel types that are more administrative or have other functions that people really aren’t aware of unless you’re an administrator, they don’t really affect the end-user experience here, but just know there are other channel types out there. and being able to articulate both… we do some data analysis here, how heavily the platform is being used, is this an email substitute almost, which is the case in some enterprises or in some business units in enterprises where they may be using Slack or Teams much more than email now, or they have integrations with other platforms like Jira.
Really, what you need to do is think about a couple of things. We’re usually dealing 20-25 or so custodians. They might have access to these data sources, but it doesn’t necessarily mean they use them. eDiscovery vendors, I think, have a problem sometimes and also law firms and casting a net around responsiveness. That’s what we try to do is do analysis on these platforms, to understand where, about what and when our actual custodians were interacting with content. I’ve had cases where we started out with 80,000 Slack channels, maybe 12 million messages, it’s a bunch of C-level guys, they’re not probably using this, we created an add-on for Enterprise Grid Slack where we’re able to do in-place analysis of individual users to basically draw a heat map that was on a monthly basis, a count of comms from those users and against what channels, be it private or public. We were able to clearly demonstrate that out of those many, many thousands of channels, these individuals, as a group, only really interacted with 700 or so channels, half of which were stuff like about riding their bicycles or internal fun stuff. That’s what a lot of people sometimes use these platforms for.
And what really happened was we created some very nice work product here in presentation format, explaining our process, and then we created this heat map. This is with an easy piece of data for the lawyers to go back to the regulators, in this case, and clearly demonstrate really both the burden of maybe having to go through everything else, but also to demonstrate exactly where the responsive information was. Then, at that point, the negotiations became, OK, well, how do you get this data into TAR, and this is where it’s a little bit… you really want to try to get it in for a Second Request that’s TAR oriented, because as soon as you start allowing a collaboration on search terms, it can open up a Pandora’s Box where you’re literally sometimes having to construct a term, depending on who you’re working with, the DoJ, FTC, and then you’re having to go randomly sample the term hits to get those sometimes into a certain amount of precision and recall, as this is very similar to the same metrics you would deal with, with TAR. You really want to avoid a search term workflow for that reason, because you can imagine how crazy that could become if you had hundreds of terms.
We have a method here post-heat mapping, and with any type of short message format, to then measure the amount of actionable text that occurs within a single channel to get it to a point of reason, as far as what’s going to go through an analytics engine where it can derive some meaning and cluster it. For Slack, typically in a heavy use organization, that is as low as 24 hours, maybe 48 hours. In an organization where they’re not using it a ton, not a ton of people posting, you may have to go out to a week or two weeks. What this means is basically creating singular documents that encompass all of those comms, those individual tweets, and chats over that time period. Alternatively, and much more advanced and just more difficult to track would be creating a secondary rolled up text files, and secondary attachment range to those text files that then correlate back to a series of documents.
This is the way to do it. We’ve been doing it for a while actually. Part of our secret sauce for Second Requests, and just in general, when you start thinking about getting short message data into analytics platforms, and this is really the next wave of, in at least, litigation and legal technology, the need for a much better technology out of the box to handle short message format. We’ve created a lot of middleware which does it well, but getting it just out right, right away, it needs to be somewhat massaged.
This is where, also, we’ve run into issues too where now we get an add-on custodian, and you need to go through into this analysis. Again, it might sound easy making a heat map, but making a heat map of 20 million individual small messages quickly, that’s a computational challenge and that’s not something that just happens in SQL. It’s usually happening in more of a big data DB or a graph database. Highly specialized experience and skill set is required to handle just all the unknowns. Then also where, I think, this really… nobody really hits this stuff either is all the add-ons, so all these platforms talk. Slack talks to Jira and to Trello and you might be having to go out into these and link these up. We also run into situations with organizations that are using G Suite heavily or OneDrive heavily on embedding links. We’ve been asked as far as to, basically, download all that link content and re-attach them to email streams, which that can be a major amount of additional data and, again, a customized approach to handling that. It’s important that you work with a provider who has the skillsets here to react to something that’s new to everybody but also where they have repeatable workflows about really the most common data types that are alternative that we encounter.
Hey Mike, can I jump in here real quick before we move to the next section? I did see a question out there that speaks to TAR or any sort of conceptual analytics and I wanted to get that answered before we move on here.
The question is, “The analytics section helpfully distinguishes strengths and weaknesses on traditional predictive coding and continuous active learning, but can you clarify if the FTC and the DoJ do or do not accept the CAL process?”
Both regulators will accept the CAL process. The reason why Anya was saying that we favor the one approach here is because it does get you to an end result in identifying the responsive population pretty quickly. With active learning, you need to define what the cutoff point is, and that’s not really a problem. I think the larger challenge is defining it. All the processes that were talked about now and going through, in this presentation, the metrics have to be logged, the decisions have to be documented, and they have to be presented to the regulators. The regulators need to agree with the process. Even as granular as the deduplication process that you’re using. So, the choices and selections that you’re making here for any of these software solutions or applications or custom solutions that are being implemented all need to be documented and then approved by the regulator.
And Young has the usual job where we throw him on the phone calls and make him sign those affidavits. Thank you very much, Jan. He is a very good expert, he has been doing this for a long time.
We don’t want to run out of time. I want to kick it off to Seth Schechtman who is going to go over some of the managed review items with Anya and we will try to tie it up pretty quickly here and open it up to any further additional questions. We would love to get more questions, so if anybody has questions or comments, please we want to interact with you guys. I know it’s tough with webinars, but we would love to hear from you. So, go ahead, Seth.
Thanks, Mike. Thanks, everyone else. I certainly don’t want to give short shrift to the integration and coordination that’s needed amongst the managed review team on the data side of the house. You obviously want to use service providers that have done this stuff before, but also end-to-end providers where there’s no daylight between any sides of the house.
In terms of managed review, there are multiple components that are important. One is the quality of [break] in Second Requests, not to even speak of the refresh portion. We really need to plan for getting, I will say, “first pass” done within the first couple of weeks, and then the latter of the month taken to do the privileged log.
So, having reviewers that are coming in very quickly, we just worked on a Second Request a few months ago where we had to add 300 reviewers over the last weekend of the project. Now, if you’re adding any old reviewers without knowing what their qualifications are, tracking their metrics, screening before they’re coming in, in terms of the quality, you don’t know what you’re going to get for the output, so that’s why, I like to say, we do things differently at HaystackID, ensuring that everyone that comes in is tested, but we’re also tracking the metrics throughout a variety of projects.
Now, in terms of scalability, especially these days (COVID days) where most review centers, if not all review centers are on-prem review centers are closed around the country, you want to make sure that you have providers that have remote reviewer access, secure virtual environments and, of course, Haystack is the originator there… the originator as I like to say, and so you’re able to scale up very, very quickly. Even in the days pre-COVID, we would use our remote review division for scaling purposes. So, if [inaudible] requests are emanating out of DC, the DC market can be more expensive and overtime is required there, but you can tap out a market very, very quickly, and so are you satisfied with getting the last reviewer who was available as opposed to choosing reviewers across the country, increasing their qualifications, their speed, and their accuracy levels to ensure you’re getting the best quality work product, especially on these short timelines where you may not discover someone doesn’t understand privileged if they’re just new to the review world.
So, we’re now going to walk through privileged logging. Obviously, the regulators are looking at the produced documents, but something that they’re taking a very close eye to is the privilege log. I’m going to throw it over to Anya to talk about special tools and techniques and technologies that we’ve built at our company to help you streamline the log creation, and then she will throw it back to me to talk about specifics of logging, areas where you’re able to maybe negotiate with the [inaudible]. Take it away, Anya.
Thank you, Seth. So, during our experience of HaystackID with the Second Requests, what we noticed is that in all the excitement of collection, processing, and deciding what workflow is going to work and all the meetings with the regulators and getting everything approved, what kind of gets overlooked until the very last minute is the privilege log. But the privilege log is a giant part of the substantial compliance and we need to start planning for it pretty much from the beginning. We need to run our privilege terms and we need to identify the potential population, and kind of work with Seth and managed review to get ready for it, to step forward, to make sure that we have it covered. And also, to decide what… based on the specs and based on the requests, what is that is being asked and what we need to provide. What is it that we’re getting logged? How will the privilege log sentence actually read on the log? Are we logging the parent emails only? Are we logging the attachments? All of those questions, they need to be asked at the very beginning so that we’re ready for when that time actually comes.
Another part of the privilege log that also needs to get reviewed for the [stack] is a privilege log legend, which is listing of all the emails that are in the privilege log, their names normalized and attorneys identified very clearly, and everybody’s position in the companies and the law firms that appear on the log. This actually takes a very long time to accomplish, so we need to get ready for this as soon as possible. We have created a workflow for the name normalization that we use internally at Haystack, and it allows us to create a dictionary that we can continue to dip back into, and for any refresh obligation, for any new custodian, we can go back to that pool and we can pull the names so we don’t have to keep redoing this, keep rerunning it, we can just quickly update all the new documents and it will add a huge benefit for us at the end when we’re scrambling to get everything out the door, and especially if we’re under a refresh obligation, that’s when it comes in to help the most. So, again, we work very closely with managed review to identify all these steps and to get ourselves ready.
Seth, back to you.
Thanks, Anya. So, I’m going to walk through some special considerations now. As we said before, there’s certainly, where you have with regulators, areas that you can push, certain ones will save more time, specifically, any dispensation you can get on logging will save time and, of course, time equals money. Not all of these things we will get and you may want to negotiate off of things, meaning that you may want to do some special things on refresh, reducing custodians, reducing timeframes that would cause you to get off of some of these logging requests. They’re all worth consideration.
The first one, all privileged documents have to be logged, meaning that …
can you log an entire family that’s privileged in a single entry noting that they are privileged. [Audio Break] have to be logged. Certainly, if not, you want to make sure that the review teams are instructed to leave unredacted information, subject line, headers, etc unless of course they have privileged information there to provide context for those privileged materials.
Questions regarding producing non-inclusives. We haven’t talked about that yet. Certainly, regulators who want those non-inclusives, but the question remains do those have to be logged. If you have the non-inclusives and they’re inclusives, all coded privileged, do you actually have to produce those or log those non-inclusive emails. HaystackID has developed tools that allow us to identify to/from/CC/BCC information from those lesser included emails, so that you can include it on the log entries for those inclusive emails, so giving you some ammo to not have to log those non-inclusives if the inclusives all the lesser included to/from/CC information.
Do exact duplicate attachments have to be logged? We’ve seen that before, where if it’s a duplicate attachment, it doesn’t have to be logged.
Can the to/from/CC/BCC field fields be collapsed into a single entry called “recipients”? important there, sometimes the regulators will push back in terms of verb choices, so we’ve seen them question log entries where requests are made for legal advice, or at least on the log it says requests were made for legal advice and the attorneys are appearing in the CC field. If you can collapse that to/CC/BCC into a single field called Recipients, it pre-emptively prevents Government from going there. again, that may be a tall ask, but it’s something to consider. All the things we’re talking about here are things that live outside of the Second Request lens as well, all things to consider when you’re talking about the ESI agreements from other sides.
Do families have to be logged if they are to/from/CC outside counsel? We’ve seen this before. The Government might not require logging of documents that are to/from outside counsel but could require it for documents that are “CC” to counsel, so something to look out for there. we’ve also seen discrepancies in terms of primary review and refresh in terms of getting that dispensation there.
Anya talked about for name normalization, trying to get ahead of third parties appearing on the log, given that unique lists with their company affiliations and titles need to be generated. Once you get the review set, trying to parse through those domains, doing the domain analysis and going back to counsel, going back to the client, and saying what are those parties’ relationships to the client. Are they privilege breakers per se, are they agents, are they functional equivalents, getting board members and getting their names, email addresses, so you’re ensuring that the review team don’t treat them as privilege breakers? You can create different highlighting sets for them with Relativity or other review platforms to help the review team by trying to understand those relationships.
One note here that’s essential for all reviews is to always instruct review team members to never assume a party, entity or person breaks privilege. You just never know what the relationship is. Certainly, the relationship can change over time. In one instance, they can be arm’s length, the other part can be a common interest holder or outside firm can be representing the client in one instance and then a couple of years, they’re representing an opposing party. All of this stuff is key in getting ahead of it, it is certainly key. You do not want to leave this for the last minute or even close to the last minute, otherwise, you will have tons of calls that are for the review, just trying to hunt them down becomes difficult.
Another thing to do is also get ahead of distribution listing. In the specs they called for handing over membership of distribution, finding out what those distribution lists are and going back to the client and seeing if they have information as to which individuals are on those distribution lists.
Other things to look out for, common interest holders, joint defense agreements in place. On the log, you’re typically going to have to list who those parties are. Certainly, watch out for merger dates, if your client has any common interests with the merging entities, it generally shouldn’t be before the merger announcement, and sometimes we’ve seen it before the finalization.
In terms of QC checks, and I will talk about this in a couple of minutes, special QC that we’ve developed, but you want to run sweeps for privileged documents that are to/from/cc .gov/.state, we also run .edu and .org, even those aren’t [inaudible] privilege breakers, you want to make sure you’re taking a special look at them. You want to check privilege withhold documents that have .gov, .state on them. At best, they should be partially priv, if they take [inaudible] to counsel or somebody else internal to the company, and certainly partial priv can be taken on those most recent emails, but you do not want to withhold those emails in their entirety with those .gov and .state domains.
Other information that you’re going to have to probably put on the log, do all entries need to have an attorney, a legal department or a law firm in the description or in the to/from/CC metadata fields. It certainly is something easy for the Government to attack the entries without those mentions of specific attorneys, law firms or legal departments. Certainly, not every privileged document has those on there, we just want to make sure you’re paying special attention to them at QC stage, whether it’s for second-line outside counsel looking at them to make sure that those can be a point of contention.
Verb choices are essential, so is it a request for legal advice, is it a rendering of legal advice. If it’s a request for legal advice, you want to make sure it’s going to attorneys. We talked a little bit earlier about to versus CC and the technicalities there, about whether those can be requests, but if it’s rendering legal advice, you want to make sure that it’s coming from the actual attorney. The Government will check to make sure there are attorneys appearing in those fields, in the appropriate fields. That’s typically why they are requiring the Esquire and the native versions in Excel of the logs, so they can filter and sort and do all sorts of things.
Now, we’re going to jump to talk about a tool that we’ve developed. It helps a ton with QC called the Stem Detection and ThreadConsistency. I will spend a couple of minutes talking about what defines consistency or inconsistency across threads and then we will jump into a more complicated sample where I think we will illustrate what we’re talking about.
So, here NP means not priv, PP means priv redact or partial priv, and PW means privilege withhold. So, the first three examples running left to right in Columns A, B, and C, the coding list is consistent, so no inconsistencies or inaccuracies, coding one straight line, not priv, the next one is priv withhold, and then in example C, it’s partial priv, so nothing to check on QC there, everything clears checks automatically. When we look at Column D and Column E, you can see D the non-inclusive is coded not priv, and then the two subsequent emails are coded as partial priv. Again, nothing unusual there, while the coding on its face may be inconsistent, in terms of privilege it’s not inaccurate, so you can go from a not priv to a partial priv, meaning that the more recent emails are redacted and that’s certainly fine. When you look at the fifth example, Column E, the document goes from privilege withhold to partial priv, and again it’s fine. So, the first in time email is redacted, then the subsequent emails when you have it, the non-inclusive it just makes up the entirety of the email, it’s coded as privilege withhold, so privilege withhold to partial priv is fine, not priv to partial priv is also fine.
Now, turn to the next slide, which talks about inconsistencies. Again, if all threads, all conversations were just single line, no branches, no forwards onto other people, it would make review and QC much easier, but here are the examples of where you have inconsistencies and inaccuracies in coding.
In Column A, the first in time email is coded as not priv. It then jumps to partial priv, which is OK, as we saw on the previous example, then the more recent email is redacted. But then the coding, on its face switches to privilege withhold. If you’re withholding the more recent email, meaning last in time, that means all of the information in the first email, all of the information in the second email is withheld. However, you’ve already coded the first email in time not priv, and the second as partial priv, so going from not priv to partial priv or partial priv… going from not priv to priv withhold or partial priv to priv withhold is certainly a flag. It needs to be investigated and coding needs to be changed.
In the second example, you go from first in time email is priv withhold, you then go from partial priv and then not priv, so the more recent one is coded as not priv. Why is the more recent not priv and the earlier in time ones partial priv. If you’re producing in full that last email that you’re coding as not priv, the two earlier ones should be coded as not privilege as well. Now, it doesn’t mean the two earlier ones are wrong and the more recent one is correct, it’s just an inconsistency that you want to check. Why is it essential to check, because the regulators are checking into these things? They’re checking for inconsistencies across redactions, not only MD5s or near-dupes, but across [inaudible]. If you’re redacting it in one instance and you’re not in another, you can call into question all of the redactions you’re taking.
So, these are pretty simple examples that you can check with basic QC searches, but if only all thread groups were like that, and you can see here in the next example and I know we’re coming up on time here, how conversations can branch off.
In this example, the green portions are consistent, so you have a not priv on the right-side branch, coded as not priv throughout, not inconsistent. And on the left side you switch from not priv to partial priv and, again, to partial priv. Again, as we talked about earlier, while coded inconsistently it is coded correctly, it is not a natural inconsistency that needs to be [inaudible]. However, if you look at the yellow portion, you have a portion that’s coded as not priv there and it switches to priv withhold, so that serves as the flag. And if you’re without our stemming tool, that allows you to track one of the stems, you would have to QC and review the entirety of the thread. The powerful part about the stemming tool that we have, it allows you to narrow down and focus down on just this yellow portion that is being inconsistency and inaccurate as opposed to the other portions, which are naturally inconsistent, but they’re coded accurately based on the context and nature of the flag.
Thank you, Seth, and anybody who has ever dealt with large-scale privilege reviews should immediately see the value of this workflow. It massively reduces risk and reduces cost and that’s what you want to do when you have a priv review. We’re basically at the end here. Why HaystackID? Our plug, in, literally 2019 alone, we did 12 Second Requests, and out of the 40 or so that’s a pretty big chunk. We’ve done six in a four-month period, so we really know how these work, everything to look out for. Obviously, it takes really good law firm partners too. When you’re working with the right outside counsel, it makes things so much easier, but we also are happy to educate and advise anybody who is just getting into these, be litigation support or attorneys and more corporations are doing a lot of M&A and they want to pre-prepare for this.
Thank you all very much. I’ll kick it off to Rob Robinson to close us out and feel free to email if you have any questions.
Excellent, Mike. Thank you so much. Mike too and ACEDS, thank you for hosting today’s webcast. We greatly appreciate it. Also, we want to thank all the presenters for excellent information and insight. Most importantly, we want to thank each of you who took the time out of your schedule to attend today’s webcast. We know how valuable your time is and we greatly appreciate you sharing it with us.
Additionally, we hope you have an opportunity to attend our next webcast and that’s scheduled for 3 December on eDiscovery Day, and it will be on the topic of mitigating risk associated with departing employees. Please check out the detailed description on the eDiscovery Day website and on the HaystackID website and we hope you can attend. Thank you again for attending today’s webcast. Have a great rest of the day and this concludes today’s webcast.
2020.10.14 – ACEDS – HAYSTACKID Anatomy Of A Second Request Webinar Deck Q4 2020 Update
Learn More About HaystackID and Second RequestsHaystackID_SecondRequest_CaseStudy_101320