[Webcast Transcript] How Academic-Centric AI Projects are Driving Legal Tech R&D

Editor’s Note: What happens when AI theory meets the tight deadlines and high stakes of legal workflows? This webcast pulls back the curtain. It gives professionals in eDiscovery, cybersecurity, and information governance a grounded view into how academic research is shaping — and being shaped by — the real-world demands of legal tech.

Rather than chasing hype, the discussion zeroes in on practical development: how AI tools are benchmarked, where data limitations show up, and what standards are actually being applied. The panelists don’t just explain what’s happening; they show how legal, technical, and ethical decisions intersect in the trenches of tool creation.

For practitioners navigating the AI wave, this is operational insight — and a rare opportunity to hear what “responsible AI” looks like when implemented with legal precision.


Expert Panelists 

+ Aron Ahmadia, PhD
Senior Director, Applied Science at Relativity

+ Roee Shraga, PhD
Assistant Professor, Computer Science, WPI

+ John Brewer (Moderator)
Chief Artificial Intelligence Officer and Chief Data Scientist, HaystackID


[Webcast Overview] How Academic-Centric AI Projects are Driving Legal Tech R&D

By HaystackID Staff

As legal teams grapple with increasingly complex data environments, the drive for intelligent, scalable, and defensible AI tools has never been more urgent. In the webcast “How Academic-Centric AI Projects Are Driving Legal Tech R&D,” thought leaders from industry and academia come together to explore how collaborative research is shaping the next generation of legal technology — not in theory, but in real-world application.

Moderated by John Brewer, Chief Artificial Intelligence Officer at HaystackID, the session features insights from Aron Ahmadia, Senior Director of Applied Science at Relativity, and Dr. Roee Shraga, Assistant Professor of Computer Science at Worcester Polytechnic Institute (WPI). Each panelist brings a unique perspective to the table: Aron from his experience building enterprise-grade AI tools like AIR for Review, and Roee from the academic frontlines of data integration, model evaluation, and machine learning theory.

The conversation goes beyond typical tech evangelism to dissect core challenges facing legal AI today: What distinguishes a true “proof of concept” from a viable product in legal tech? How do we source, annotate, and validate real-world legal datasets when privacy concerns make data scarce or sensitive? What does “model confidence” look like when applied to AI tools tasked with document review, topic detection, or privilege recognition? And how do we prepare future legal and technical professionals to work responsibly with such tools?

From humorous reflections on email redactions that fail to hide identities, to technical benchmarking of legal document handling by LLMs such as Claude and Qwen2.5, the discussion weaves together practical insight and research-backed nuance. Roee’s academic team, in partnership with HaystackID and Relativity, explored how document length, vocabulary density, and self-assessed confidence influence LLM performance in legal contexts. Meanwhile, Aron emphasized the importance of rigorous privacy and security protocols, even at the early stages of AI development, particularly when customer data is involved.

This transcript is not just a replay of a panel — it’s a snapshot of where legal tech R&D stands today: cautious yet curious, innovative yet defensible. For professionals working in eDiscovery, cybersecurity, or information governance, the discussion serves as a reminder that building trust in AI systems isn’t just about accuracy — it’s about collaboration, context, and constant evaluation.

Whether you’re guiding your organization through AI adoption or evaluating vendor claims of “research-backed” solutions, this conversation provides a foundation for informed engagement with the technologies shaping tomorrow’s legal landscape.

Watch the recording or read the transcript below to learn more.


Transcript

Webcast Support

Hello everyone and welcome to today’s webinar. We have a great session lined up for you today. Before we get started, there are just a few general housekeeping points to cover. First and foremost, please use the online question tool to post any questions that you have and we will share this with our speakers. Second, if you experience any technical difficulties today, please use the same question tool and a member of our admin team will be on hand to support you. And finally, just to note, this session is being recorded and we’ll be sharing a recording with you via email in the coming days. So without further ado, I’d like to hand it over to our speakers, Aron, Roee and John to get us started.

John Brewer

Thank you very much, Tammy. So hi everyone, and welcome to another HaystackID webcast. I’m John Brewer and I’m serving as the expert panelist, lead and moderator for today’s webcast titled How Academic Centric AI Projects Are Driving Legal Tech R&D. This program is part of HaystackID’s ongoing educational series supporting cybersecurity information governance and e-discovery objectives. We are recording today’s webcast for future on-demand viewing, and we’ll make the recording along with a complete transcript available on the HaystackID website at haystackid.com. Today’s webcast offers a practical look at the academic industry partnerships that are producing validated, defensible, and scalable AI capabilities for the legal ecosystem. But before turning to the agenda though, I’d like to introduce our speakers. So would you like to introduce yourself or would you like me to introduce you?

Roee Shraga

Happy to introduce myself. So hello everyone. My name is Roee. I’m a computer science data science and AI professor at WPI here in Worcester just outside of Boston. Prior to that, I’ve done a postdoc fellowship at Khoury College of Computer Science at Northeastern. My research mainly revolves around data discovery, data integration, and everything that has to do with the data itself before it is plugged into your AI. I’ve published my research in top tier conferences in the domain of databases and information retrieval, and I’ve been working on legal tech projects with John and HaystackID for the past two years, and I’m excited to share some of those insights today in this webcast. So happy to answer any questions later on or offline. And John, back to you.

John Brewer

Yeah, thank you very much. Roee. Aron, would you like to introduce yourself?

Aron Ahmadia

Absolutely. So I am the head of Applied Science Relativity. I also get to claim a nice fancy PhD title. Mine is in applied Mathematics with a specialization in computational sciences. I like to remind everybody that I’ve been in legal tech for about 30 years now, ever since I helped wire my mom’s legal office with Windows for Workgroups as a migration from DOS. I’ve also spent a significant amount of time on the bench where I was a judge for the International Federation for Competitive Eating for two years, running in the Turkey eating Contest, which is sadly retired. More serious credentials, I run a team of about 20 applied scientists around the world. We build products that you know and love like Airfoot review and some other tools. And we also at Relativity really do believe very strongly in investing in people and their educational journey throughout their entire careers. And I’m very happy to talk to you guys today and be part of this panel. So thank you for inviting me, John.

John Brewer

Oh, thank you so much Aron. And so I’m John Brewer. I’m the Chief artificial intelligence officer over at HaystackID and I do not actually have a PhD, which kind of makes me the village idiot on this panel. But I do have a team of very, very smart data scientists and engineers that I work with every day. And we are more on the technical and delivery end of the AI integration of generative AI and the various other technologies that we’re going to be discussing today. So I am kind of coming at this from the consumer point of view and how our businesses interact with not only academic institutions like Roee is representing here, but with our very talented vendor pool such as Aron’s organization, Relativity, of course. So let’s go into the first kind of subject area that we wanted to cover while we were here. POCs versus Products.

Now this is something that we’ve had a bunch of conversations on over the past couple of years, and I know that this is a big difference in how we approach AI, both from a research perspective and from an actual product perspective. Obviously both of those have different objectives, different types, and I feel that the term proof of concept has gotten an almost derogatory bent to it in the past year or so where it’s used to describe a product that isn’t quite there yet. Whereas with a lot of the research that we’re doing and a lot of the interactions that we have with our academic partners, we’re absolutely looking for just that concept, that proof to show that something can be done before we invest in actually building out a product to the exacting standards that the legal community demands every day. So Aron, why don’t we start with you on this one, since you obviously not only have to wear the research hat, but also the product management hat. What would you call the differences between a POC and a product at the fundamental levels here?

Aron Ahmadia

Yeah, I think that when we think about our development lifecycle at Relativity, and I think many other companies in this space, proof of concept is something that precedes a product. And if you think of the idea space of things that we might work or a customer requests of us or something that sounds like the right idea to get it to a POC has to be more than a, say, something on a whiteboard, typically there is some code involved, there’s some data involved. We think less about some of the concepts of reliability. We may not be accurate, so there’s some of the things that we may give up. Hopefully it’s quick to make and perhaps not too expensive for us to try and build. But I will say that there are things that we don’t compromise on when we build POCs in the legal data environment. And one of those comes down to the safety of data and the privacy of data that’s being used. And so if we are doing a POC and that POC involves real customer data, we follow all of the same rules with that data that we would follow otherwise within the product. And that’s, here’s different, but here’s the same. And so that’s my, let’s say 60 second take on POCs versus products.

John Brewer

Very good. Roee, obviously proof of concept probably has a different connotation in your world versus product than it does in an industrial environment per se, or a commercial environment in any event. So what’s your take on the difference there between proving a concept versus making something that you think can be carried forward to a product in the commercial world?

Roee Shraga

Yeah, of course. So first of all, I do not do products. It sounds funny, but it is what it is. I typically end my research after a successful proof of concept, and this is where again, within the scope of this project with the HaystackID, this is where John comes in, snatching my students, taking them into his company and starting to do products. So I don’t know if you want to elaborate more on that later on, but typically my work is done once I have this proof of concept. But I’ll opt to talk more about what happens before the proof of concept. And I think Aron touched on that a bit, where students coming in, it can be a group of students working with data relevant to those domains or any other projects and they say, “Oh, the results look good”, or, “The algorithm looks good.”

Then the first question that I asked, “What is good? Do you look at it, eyeball it and say, ‘Oh, it looks good,’ or do you anecdotally find one or two outputs, look at this and say, ‘Oh, it seems meaningful?'” And I think this is a gap that I’m trying to bridge, especially with younger or early-stage students where they need to understand the concept of a proof of concept where I tell them, listen, you need to bring me numbers. So the fact that you tell me, oh, I looked at 1, 2, 3 outputs and they seem to make sense, especially with LLMs being around. And yes, I know I’m the first to throw out this word, but especially with these GenAI models being around, it’s very hard to understand what is behind that. So how I see proof of concept is give me some data potentially with some annotations, look at the results, accuracy, whatnot, and then you can tell me, oh, this is how it looks like in practice, it’s 60% accurate, where it’s 70% reliable if you will, given actual data and actual concepts. Later on, Aron and John can tell me whether for the product it’s enough at 60%, but I need the term or I need the terms to be used to say this is 60% and this is how I see proof of concept in my domain. Yeah, thank you John.

John Brewer

No, I think you actually hit on a pretty interesting point that we’re dealing with in the AI sphere more so than in most other product environments that I’ve worked in before, which is the work that we are doing is kind of inherently stochastic. I think that especially in the legal community, for anybody who hasn’t used some of the machine learning technologies like TAR and whatnot, there’s definitely a belief that either the computer gets it a hundred percent right or gets it really, really wrong. And we’ve definitely had to work with our customers and work with just our various audiences really in terms of helping them to understand that we do have those basically percentage correctness areas that we’re working in. In document review, we’re worried about recall and precision, but I think that we’re seeing even with attorneys who are using ChatGPT to work on who are doing briefings and praying to God that they don’t lose their license because they miss some citation there, whether or not they’re going to be having a challenge on that. So let me go back to Aron here because I think you probably have the most experience in direct product development of anybody here. Do you find that that’s a new and unique challenge that’s in AI or is that just another feature that you need to explain to your stakeholders as you’re going along?

Aron Ahmadia

The problem, John, is that I’ve been doing AI product development for the last decade.

John Brewer

Fair point.

Aron Ahmadia

So I’m immune to the inconsistencies. One of the first products that I built in the AI space was a cloud-based automated transcription tool. So a company I was working on had hundreds of thousands of phone calls coming in a day and they wanted to transcribe those all and turn them into transcripts, which at the time I remember when I was interviewing for the position, they’re like, “We’re going to transcribe all these phone calls.” I’m like, “No, we’re not. That sounds impossible.” And of course we were doing it within a year, but I mean our error rate started at 30% or 40%. And so concept of reliability are as much tied in with the accuracy of the system as they are with its, let’s use this fancy word, stochastically. And what I think is really funny is that people will perceive that high variance or highest stochastically is associated with a higher error rate.

And that’s not always true. So these AI systems can have very high error rates even though they appear to be creating this sort of beautiful accurate text. And the other side can be true too. The system can look like it’s wildly all over the place, but if you look at all the different answers it’s generating, they can all be correct. And so our instincts of something of communicating and it’s a different output every single time. Our instincts are, oh, it must be wildly wrong, but those aren’t necessarily true. And so as you said, if people are, in dealing with this and they’re demanding more sort of consistency from their AI systems, it is really kind of about building trust and it’s about building trust that the system is going to be accurate. But also if I’m going to go into a demo, I’m going to show the system and it does things one way and all of a sudden it changes its behavior completely. That could be fairly embarrassing and you can lose a lot of credibility. And I think it changes not just the way that you use and you document the software, it shows it changes how you sell it changes how you enable it. There’s all sorts of pieces that get affected when the system is sort of a little bit inconsistent.

John Brewer

No, I think that makes sense. And I know that I’ve definitely been around long enough to fear doing live tech demos and that has only gotten worse as LLMs have gotten folded into things. That’s always an opportunity for the technology to remind you of your humility, I would say. And I think that that’s interesting. I wanted to bounce over to Roee on that in, I guess simply put, what’s your take on the way that we’re seeing people interacting with these products in these systems? I know that it’s not necessarily a direct thing that we work on and that we study directly, but something that we get to see as people, not only customers and users, but also people who are entering the pipeline of development and people who are going to be using these tools and building these tools are getting to know what the strengths and limitations are there. How do you see people interacting just from a trust perspective with these underlying technologies, the LLMs and other models in place?

Roee Shraga

So surprisingly over the last few years, people blindly trust those models in the wild world. So for any questions out there, the common person and will seek to use GPT nowadays, they used to seek websites on Google. So that’s kind of like a common practice nowadays, and this is why it’s exciting to work in the legal domain or if you want to make the equivalence in the healthcare domain, is that people are more cautious using those tools while still assuming that there is a merit to what that product brings to life. And I feel that in general, bringing you mentioned trust, I like the term confidence and we will probably cycle back to when we briefly talk about one of some of the insights that we’ve obtained from last year’s project is that those models are not always a hundred percent confident as well. And I think one way to view that is benchmarking the idea that even if an LLM gives you an answer, whether it’s in the general case or in our case for legal tech for e-discovery, whenever it gives you an answer similar to where a human will give you an answer, they will have an attached confidence that comes with.

This confidence can be embedded in the technology. In other words, whatever you use your API to run those calls, by the way, whether it’s within the Relativity framework or you just use the API that ChatGPT provides, sometimes you can just extract the confidence from the model itself rather than asking it where the other option is just to ask the model. And you’ll see that a lot of cases, not even going into confidence, but just asking the model to what we now like to call self-refine itself or self judge itself, you’ll find that the model by itself is not a hundred percent confidence in the results that they have obtained. And I feel like that has become more of a common thing people do when using those technologies to both think more on whether or not to accept those answers as a hundred percent true. And also to have the model self-refine or self-judge its own judgment to get a sense of it.

And I think this will become more and more of a thing because there’s a lot of ambiguity and a lot of difference between what is factual. So in other words, you ask the model to provide facts, those can be right or wrong, but there can also be subjective notions of it where you ask the model to provide some subjective idea of something that is rooted somewhere in its own what we like to call a knowledge base or a knowledge graph internally to this model, but is not necessarily affect. So I think all of those come into both how people are using those models and will shape the way people will use the models in the upcoming years. Again, as I see that from my own observations.

John Brewer

No, that makes sense. And so to kind of refocus here on the actual work and research that we’ve been doing here, I wanted to talk about one of the big questions that I get that I talk about are doing research and whether we’re doing it internally, whether we’re doing it externally, and this is of course in the legal world, is the problem of data sourcing, which is one of the biggest challenges that I think we encounter on a regular basis. And this is kind of where we are transitioning from the conversation about proof of concepts into actual lab research where we’re doing here. So just to kind of draw the distinction for our audience here, there are two major ways that at least in my experience, that products get enhanced, which is there’s certainly the lab work where we go in and in a state of grace as it were in an ivory tower, if you will build applications, test them.

This is where all the hard statistics come in and we’re working on exact and figuring out a theoretical level how things work, how things fit together. And then we have the proof of concepts where we take a solution and especially if we’re dealing with an enterprise customer, then we might be building to a particular target, or we are building to a small or medium business organization might have a couple months proof of concept time on a solution that we’re building that’s not necessarily specific to them.

When we’re working with a commercial customer, there’s usually a question of what data are we going to be working with? And of course there are data processing agreements and in the case of healthcare BAAs and certainly an entire legal framework around how that data gets utilized and how it gets sourced, so that can be used in our tools, but if you are not in the legal community, if you’re not providing services, then you have a real challenge creating or obtaining, I guess I should say, real live data. Now, historically, and if you were to ask Google or even ChatGPT about where do I get legal test data, they’re going to give you exactly two sources. I suspect everybody already who’s on this call has already heard of which of course the Enron data set that was released in 2007, I want to say by the Federal Energy Commission, and then of course the Jeb Bush emails that were released in I think 2008, 2009. And you’ll see that data loaded into almost every product demo for not just in AI, but for almost any legal tech demo over time.

I know that there have been a number of other sources out there, and I know that both of Aron and Roee, you both have dealt with sourcing data for various projects from other sources, and you’ve both kind of come to your own solutions there. Do either of you want to talk a little bit about how we get data for the pure research side of operations where we’re not just doing proofs of concepts with customers, but where we actually have to get data sourced from somewhere?

Aron Ahmadia

I mean, I’ll go first I think.

John Brewer

Fair enough.

Aron Ahmadia

I think that there is this phrase that became popular a decade ago. It’s like, “Data is oil, it iss the new oil.” It’s like, “Everybody needs to have their data.” And so all of these companies were sort of just piling all their data. I remember again, sitting in a large, large company all hands, this is a company prior to Relativity in a very prominent executive said, “We’re going to save all the data, every bit of it, every log, every click.” I was like, “That just seems maybe a little too much.” And then there’s another phrase that comes kind of from the R community and sort of more statistical approach to the world, and the observation is data is residue data is frequently a log file or something that gets left behind and maybe it’s useful, maybe you’re scraping it off the floor with a shovel or something and piling it into your model.

When we think about our world and we think broadly about discovery and how what we’re trying to do works, these data sets like Enron and Jeb Bush, they’re very dated. And one of the reasons why we haven’t seen more of these data sets is because for both Enron and Jeb Bush, these emails contain a lot of personal information. They contain names, they contained sensitive information that people had been sharing with each other or across. And so Enron has actually been cleaned up a couple times since it’s been released where people sort of wrote in and said, “Hey, I’m very sorry that I was using my work email to have an affair or to exchange stock tips with my friend, but I would really prefer it if my emails did not get used in the next 30 years of training data for GPT7.

But coming back to how I think about data and just the value of it is a piece of a very large puzzle because typically when data arrives to a place like Relativity, there are a series of actions that are taken at a very high level like, “Hey, help me find out which of these might be relevant to this matter or help me ensure that anything that looks like a privileged communication gets found.”

But then a lot of the value later on is very contextual. “This complaint was served or this investigation is happening and I’m concerned about what this person was saying and doing, but I don’t care at all about what happened in this timeframe or about these emails, and I certainly don’t care about this.” But then if you shine a different flashlight on the same data, you might see a totally different set of things that you care about or do. So I’m very much more in the data is residue, but it’s almost like the data is kind of a very interesting, almost like a Rorschach test where it’s a three-dimensional Rorschach test where the data that is important, it depends on the flashlight you’re shining on it, but not only that, you still need that structure of data to represent the types of data that people run into, and whether that’s a PowerPoint file or an Excel spreadsheet or any of these other things, Slack data, any of these other things that are actually quite messy and not sort of just what we, in some senses, imagine, especially if all we’ve been exposed to is Enron in terms of data.

I think the worst defenses are actually not Enron or Jeb Bush, though they’re quite bad. It’s actually transcripts because they are structured so cleanly that if you’re demoing certain types of tools, they just perform super well in transcripts and then people just assume that they’re going to be able to point it at their giant pile of data and get some really great results often to their chagrin. So I’ll pause there, but I’ll say this, the data is one of the challenges that raw data, but it’s not the only thing that we care about when we’re sort of building testing and proving on our products.

John Brewer

No, that’s interesting. And I think that the data is residue kind of lens is definitely something that is applicable to the legal data world because generally speaking, everything that we are ingesting was not created with the intention of being in a legal process. Very few people write their emails with the intention that this might someday be collected in litigation, although it is hilarious to meet people who do write emails that way.

Aron Ahmadia

Some of us write emails every day, assuming that some system may discover them at some point. It changes the way you communicate when you think about discovery as your profession.

John Brewer

No, it’s true. And certainly I think that people have become a lot more aware of that, especially as Google has been more open about the uses that they make with services like Gmail. And certainly Meta has been very forward with how they’ve been using user content there in order to supply their data needs for their AI program. I suspect that we’re probably going to see more and more of that happening going forward, but okay.

Oh, by the way, I probably should have said this earlier, but there is a question and answer block that participants and attendees have access to if you want to follow up with a question on any of the material that we’re discussing here. So let me ask Roee, when we are talking about this, obviously you have a challenge in kind of being a relative outsider to the legal tech community, but you also have advantages in having access to people and resources that we really have to go out of our way to get access to. So now that you’ve, for what, a year and a half now been really intensely working in the legal space, what have you learned? What are your big takeaways about finding data that is usable and is practical for this kind of work?

Roee Shraga

Wonderful. So first of all, I’m so happy that I let Aron talk first. You guys are the domain experts and this is why you care about the whole picture more than the data. My role both in this and in general, I’m a data expert. My students come in to get data to work with and I have, so the most valuable definitely not residue piece of information is the data that I’m obtaining. So the way that I work with my students or the way students are typically instructed to work when they’re in my lab or in general in data science, maybe nowadays, maybe it used to be different, maybe if we have domain experts in teams, it would be different is through the data, which makes this far more challenging where you don’t have both valuable data and task-oriented or application-aware data. So all those datasets for example, they do not come with, and this is another point that I wanted to make with annotations for a given task.

Just to scope it in a way that makes sense without talking too much about one of the projects that we did before, if you do topic detection on those datasets, Jeb Bush, what have you, those datasets typically do not come with those annotations, which makes your lives interesting because you yourself are the main experts, plus you have people who know what’s going on there to work with those data points. But we don’t have that. And I think it’s both challenging and fascinating at the same time because one, you need to draw data that is similar yet not the same. So one of the things that we ended up doing is to find datasets that are similar yet different, but do come with annotations. For example, one thing that we have used in past projects is an IT dataset with IT tickets with topics pre-annotated and so on, which is of course not your common or basic legal tech e-discovery dataset, but it is similar enough such that you can draw insights over these kind of datasets, have your students work on a task that is fairly similar to the task at hand and yet not rely on those, let’s call it scarce, resources, data, resources that we typically have.

Two, you just have to beg for data. John knows that, I don’t know if he hinted that before or not, but we just need to work and work and work until we get some valuable data. We don’t know the whole story. So squeezing the whole story, squeezing the other ingredients or the other factors in the whole story that we’re interested in from those domain experts, having conversations with domain experts such as John, such as Aron, such as other people in the domain, and understanding more of the bigger picture. Because when people in the e-discovery domain, legal tech domain, they say entity extraction, they say eDiscovery, they have a very specific definition and a very specific story that they have in mind, very specific data that they can envision even if it doesn’t exist, which we don’t know yet. So I think a big educational challenge here, or a big experience that the students have to go through is working through those challenges side by side with those domain experts, having access to some data, even if it’s not perfect, and getting familiarized with what is going on, what is the bigger picture, what are the tasks, how can we solve them, where can we get those annotations from, if we can get those annotations from somewhere and so on.

So bottom line, it’s been a great journey navigating through this. I still think that there is a big need for those educational data sets, benchmarks to be created. It’s a lot of work and there isn’t a lot of time by those companies to create those. So one of the things that I’ve been doing with my students to try to figure out how to get this in the works, but bottom line data is you can call it residue, but sometimes this is all you have. So the fact that residue brings on challenges and makes my life as a researcher in this domain very interesting. So yeah, I think those sums up my points.

John Brewer

So if I can just ask you for 30 seconds for some of our audience members, when you say annotations, that has a different meaning to you than it does to many of our friends in the legal community. You want to just clarify what that means in this case?

Roee Shraga

Yes, of course, of course. And that’s exactly the point, right? So going into this domain, you need to learn how people talk in this domain, there is a whole new language that you have to learn. When I say annotation, it mainly depends on the task at hand. Let’s take it as an example, topic detection. Given a document, I want to know which topics are discussed in this domain. A form of annotation can be of a multiclass. In other words, given a document, what types of topics are discussed here in a given document? And it can also be binary. Is this document talking about Aron? Is this document talking about John in that sense? So when I talk about annotation first, it’s task dependent, and two, it can come in many flavors that can differ between the definition definition, but the pure idea is that a human expert tells me what I’m looking for in a given document in this domain.

John Brewer

Okay, great. No, I think that’s-

Roee Shraga

Hopefully that made some sense. Yes.

John Brewer

It did to me. And if anybody feels that we need to cover that at greater depth, by all means drop a question in the Q&A slot there. One final thing that I wanted to touch on while we’re on the topic of data sourcing, because this is a conversation that has come up a number of different times, and I think that it’s unique to certainly to AI methodology or to AI based products, but also to the legal industry sensitivities, which around anonymization and redaction, which is in other locations where we’re collecting personal data, we get redacted or depersonalized information for training all sorts of systems. This goes all the way back to the first days of neural networks and how we’re training systems up. The interesting case that seems to be coming up, especially as big data becomes the standard in our society right now, and that each of us has so much information, even just that’s held by our companies or can be held by a single production, that anonymization has gotten so practically difficult and has become so easy to de-anonymize data that individuals or organizations or sources of data are difficult to meaningfully redact or de-anonymize.

And as such, we’re seeing people a lot, especially organizations with mature data programs, be very, very resistant to contribute to any of these programs. Have you had any experience with that? And I’ll start with Roee now because Aron is smiling in a way that makes me nervous.

Aron Ahmadia

I was just chuckling because there was a recently published deposition where somebody, their name was redacted and they were like, where I was the head of applied science at a very large company and I was like, “We know who you are.”

John Brewer

Well, that’s exactly the point.

Aron Ahmadia

I was like, I don’t even know why the name is blacked out. So you’re talking about a much more subtle problem than that, but you sparked a memory.

John Brewer

I mean, to a certain extent we’re talking about a more subtle problem, but in a very real sense, we are talking about anytime that we’re dealing with complex data threads of individuals, whether it is email data, whether it is depositions, whether it is location tracking data, which doesn’t super often come up in the civil side of litigation, but certainly comes up on the criminal side quite often. And certainly those are critical pieces of information for us to have. But the fact of the matter is that it is very, very easy to take that data and reconstruct redacted or anonymized sections there. And I think that I have certainly gotten pushback when we’ve put in data requests saying, “We don’t think that this data be truly anonymized.” And for that reason, we’re moving on with that. And again, I’ll start with Roee here because this is an almost abstract question of how can you or is it possible to truly redact and anonymize data or is that just a challenge that we’re going to need to go with that we’re going to need to deal with some other way going forward?

Roee Shraga

That’s a great point, and I think I’ll build on top of the pun that Aron just mentioned is that a lot of times you would go through the process and you would still be able to identify the person. So there are protocols in place to try to make this practically impossible to recognize, but at the same time, in peer review, sometimes, by the way, if someone writes a paper or something like that, you can figure out who this is. And similarly with emails, so just by the way that they phrase their words and their sentences and so on, you can recognize this. So first of all, there are protocols out there, but I’ll opt to answer that from a different perspective. Maybe that ties a bit more to what we were doing. I think one thing that we can think about is working more with open source models rather than those hosted models.

So in other words, paying attention to what happens to your data when you plug it into your machine learning model. Because a lot of people nowadays opt to use those ChatGPT models, those hosted Gemini ChatGPT, whatever, those models that are hosted on typically a third person machines and things like that. And this makes the sensitivity of the data even more of an issue just because the data goes into another source as an after effect. So I think again, there isn’t a magic formula there, and I think those companies that shared this data need to feel comfortable with whatever is there, whether or whether or not someone is able to reconstruct that. But I think us as people who are working with the AI tools, we can also pay attention to the way that we are using this data within our models. Again, open source models, sometimes companies that work, again, I don’t know what happens both in Relativity and Haystack, but sometimes companies that work with Microsoft Tools, they have the Copilot with GPT in the background, which may be not the same performance as the external GPT, but it is more confidential. So again, that’s how I address those kind of things in my projects. And I think Aron has a lot to add, so I’ll be more than happy to hear.

Aron Ahmadia

Yeah, I think this is where we just start drawing slightly firmer lines between academia and industry. I can’t sort of drag my general counsel or our privacy team or our security team into this call, this screen would just get filled up with all of those people. But some of the things that we might say or that we do say, is for example, we do use third-party hosted models in Relativity, and we have for the last three years, and in some sense longer than that. Why we feel so comfortable and safe doing that. It comes down to many, many, many guarantees and protections, some of them contractual that go between us and our customers, and then our provider, in this case, Microsoft Azure. And some of these things have names like zero data retention, which means that when we use data that gets sent over to a hosted service and the Azure Cloud, then that data is deleted within a very, very short time period after it’s been sent over and the response is sent back. And anything that was sort of like an intermediate result.

And even to the point where, for example, when one of the providers started offering caching, we asked them, it’s like, “How long does that cache exist?” It is just a cache to make the GPU more efficient if you send something in, and we want to know because to us that matters. That’s the level of guarantee that we provide our customers. And so we don’t only provide things like zero data retention, we offer things like data sovereignty or sort of like if your data originates in a certain region, you’ll know where it’s allowed to go and where it isn’t. And again, that really matters in some countries or some jurisdictions that you’re not allowed to send data across borders or to the city down the street, they’re the enemies. So those are two very strong guarantees.

And then they come with a whole set of guarantees around how secure everything from the origin of that data to its storage goes. I don’t want to sort of make fun of you as an academic because that would be very unfair because I was an academic, and when I was at Columbia University, the IT department called me up because my cool little work station machine had been rooted and a quarter of Columbia’s traffic was now going through my system because some people had discovered that it was an open proxy and they could use it to do whatever they wanted, but the rules are very different. I would never be able to do something like that at Relativity. I would never just sort of have an open research machine and we’re in two different worlds and that’s a good thing. But because of how secure, because of how private our data is, because of all the rules we have with customers, I could never imagine a world where I would go to a customer and say, “Hey, do you think we could use some of your data for an experiment or a data set?”

We could do it in the sense that if they have something they’re trying to find, we can help them find it and we can frame it as an experiment. What we cannot do is then turn around and take that data set that we just sort of found all that those results with and then make it available to the community as much as it would be extremely useful. And so this is the game and the balance that we are trying to play because we do want to encourage trust in our products and our models as well, but providing access to, I mean, when data enters discovery, there’s a couple of reasons why, but a lot of times somebody has done something wrong or is being accused of doing something wrong, and then the emails are often an innocent party that’s involved. So nobody whose stuff is being read would really appreciate being turned into a, it’s like when your body gets donated to science and it becomes a cadaver for the medical students to operate on.

Most people aren’t signing up for that when they join a company. There’s a balance here. I will say that the challenge of data is important. The challenge of annotations is 10 times greater. And so a lot of people are like, “But in Enron,” and all this stuff, it’s like I look at the decades of research that we have done, and NIST and Ellen Voorhees at a Trek conference really set so much of what we think about in terms from Relativity in terms of information retrieval, annotation agreement from annotators is 50%, right? I can have two people run through and label a data set for me.

And some of the data sets that are commonly used, for example on Enron, I mean, some of those issue labels are very, very all over the place. And so yeah, the data isn’t changing. The Enron emails themselves are not changing, but the annotations themselves are very, very slippery. And that could take two attorneys, give them the exact same review protocol, and I would be happy if they were about 80% in agreement. Which is part of the reason why when you look at the standards for a successful review, it’s not 99% recall or 99% accurate. It’s like, well, if we can get to 80%, then we’re probably in general agreement, and that’s about as good as we could do. But a lot of that has to do with the subjectivity of a lot of these annotations and labels. It’s very easy to say, oh, this is personal information. It’s another thing completely to say, this is responsive to the complaint. This person used unfair practices according to these six laws. It could be a little wild.

John Brewer

That’s fantastic. No, no, I think that those are some excellent points that you bring up there, Aron. I do want to just move on here about 10 minutes left on the clock, and I know that we actually want to touch on the practical pieces that each of you have been working on over the past, I guess couple of years at this point. So Aron, I know that you were kind enough to send over a couple of slides in terms of Relativity’s academic program. Do you want to speak to that just a little bit? I’ll give you your first slide here and you can just explain how Relativity is interacting.

Aron Ahmadia

Yeah, it comes down to, I think, more of a philosophy than sort of a specific series of actions. I think you’ve been to Fest and Roee, you’ve been to Fest as well. And I think when you come to Fest in Chicago or London or Australia, you’ll find that we’re building more than software. We’re building a community. And I think that one of the things that we’re most proud of is we’re also building careers. So people come and they start using Relativity, and we do start in various places where people study for legal education and not just attorneys, but if you’re studying as a paralegal or somebody else who participates in the justice system, we think it’s really essential. It used to be important. Now it’s essential that you understand the technologies that are involved in discovery, which is one of the most important pieces of the modern justice system. And we started in the classroom, we partner with law schools, paralegal programs, community colleges, and we think of this as investing in the next generation of legal minds and the next generation, hopefully, of our user community. It doesn’t stop there. We believe that your education does not stop with your diploma. We’re all lifelong learners, so we invest in our users. So you’ll see things like all the certifications and probably our Gold Star standards, a Relativity certified Administrator, and then we have a lot of other specialty certifications as well.

I think that we have one more slide.

John Brewer

Yeah, there you go.

Aron Ahmadia

And I just want to sort of say, we know that some people have very busy careers, and first of all, I appreciate everybody making the time to join us on this webinar. And so sometimes people just want to take five minutes to sort of get a little bit more. And so we offer these really small bite-sized learning labs. It just gives you a feel for the experience of being in Relativity looking at a matter. But we also have more structured AIR specialization program, and again, it’s about one and a half hours each. So six hours of total content, that’s the extent of my PhD in applied math, still allows me to do arithmetic on the fly. And these allow our customers to more comfortably and reliably use these tools with the confidence that they understand not just how to use them, but how they’re built and how they work.

And I will say we offer workshops as well, and these are sort of some of our most popular sessions as we will offer these sort of two-hour sessions on how to use AIR for review or a case strategy. And I constantly encourage people that if they have the opportunity to take one of these and they haven’t really been in the weeds with one of these tools, or even if they have it, is really the best way to experience and understand a tool that looks nothing like a lot of what they’re working with with a tool like ChatGPT or something like that. Tools like ChatGPT and name any other tool, they’re designed to interact with you and help you answer a question and give you feedback on something that you’ve written. A tool like AIR for review is designed to help you go review a million documents.

And so it is a very different game when you can design a system and interact with a system that is going to do hundreds of thousands of hours of work on your behalf. And so we operate at a scale that’s just, you really don’t see anywhere else in the industry right now. So getting a taste of that and how you build, operate and use a system that gives you trust. One thing that I always find… People always get a little confused by this. I’m like, you’re building the model, right? You enter the prompt, then you run the prompt. John, Roee, I know you guys get it, but this is for the whole audience. You enter your prompt and then that’s the thing that’s going to classify documents and you want to measure that thing. You want to validate it, you want to understand how it works, you want to see how it works on your documents.

You want to do some statistics on it. And so Roee’s out there, he’s looking for data. I mean, you get to be the data scientist every time you sit down in front of a tool like Air for review. And that’s its own kind of special joy. And of course they’re industry partners like Haystack that can help you do this and get more comfortable and confident. And that’s part of the fun too. So I’ll pause. The education is an extremely important component, not just for Relativity, but for me personally.

John Brewer

No, no, I think that’s an excellent summary there. Thank you very much Aron. And Roee, I know we’re getting a little thin on time here, but I wanted to give you an opportunity to go over, you gave us a slide to more or less summarize a lot of the work that you’ve been doing over the past year and a half here, just to talk to that a little bit.

Roee Shraga

Sure, of course. So first of all, since we don’t have a lot of time, I’ll do offer myself. My email is available. You can reach out to Haystack to get access to me. I’m more than happy to share more offline just in a one or two summary. What we are doing in our projects aims to explore things that these guys very busy industry people don’t have the time to do. So if Aron and John just use the best LLM out there because this is what typically is thought of as the best for the customers, we go back and try out the 10 existing LLMs that exist out there, try to gather data. Here are two examples of data sets that are, one of them is based on the Jeb Bush data set and the second, as I mentioned before with the IT and try to explore what makes an LLM good at that.

So here you see a quick summary of the accuracy by playing around with different prompts. We ran a few experiments looking at the length of document. Will it be easier for an LLM to do shorter documents? Of course it is. Or longer documents. What happens if you have a lot of unique words? What happens if you have different types of documents, emails, transcripts, things like that. What happens if you use the model to run just one document where you batch process a set of documents? Those are the type of questions that we ask ourselves with the idea that people in the industry are trying to provide those products. And we have the time to sit back, try to figure out what happens behind the scenes, what makes this LLM beneficial for this data, what else? And we are of course also looking into ways to boost those performance. One of the things that we play around with was HGNN, which showed nice improvements over the results. So again, respecting our time here at this webcast, I’ll turn it back to John, and I’m more than happy to talk more about the findings in our current work.

John Brewer

Thank you very, very much, Roee. And I’ll say that as with most places where academia and industry intersect, it is very helpful to have somebody who’s not only doing this research but is publishing this research in a place that’s accessible. Because I think a lot of the questions that you’re answering here are questions that all of our teams on the commercial side have had to get answered one place or another. And I know that some of either we answer them in a narrower context or under the restrictions that were commercial to the particular environment that we’re working in. But it is really nice to have somebody going back and, one, checking our work and two saying, “Okay, here are places that you might not have explored for a variety of practical reasons, but you might want to consider.”

Now, we’re just about at time here, but we might have time for one question if we have it in the queue here. And actually I’m, I think I’m seeing that there are no questions in the queue. So I think we have solidly covered our entire basis. We have explained this completely. There are no uncovered areas, and I feel really good about that. I didn’t think that we would cover the entire interaction here, but we clearly did. So once again, thank you so much Roee and Aron for coming with us here and taking the time to not only to sit here with us, but also all of the preparatory time that I know that both of you spent to get ready for this. So thanks for joining today’s webcast, everybody. The time and attention given to this educational series is truly valued for those interested in continuing education. Don’t miss the upcoming eDiscovery Day webcast on December 4th, hosted by EDRM featuring John Wilson, the CISO at HaystackID on hybrid work and hybrid risks governance for governing eDiscovery and in an AI world. This webcast will be providing actual guidance for navigating new data realities, managing collaboration platforms, chat data, cloud documents, and well leveraging AI enabled tools to address emerging challenges like what we’re just talking about here. And also visit haystackid.com to learn more. Register for the December 4th workshop and explore the extensive library of on-demand webcasts.

Once again, thank you very much for joining today’s webcast on how academic centric AI projects are driving legal tech, R&D and have a great day. Bye-bye.

Webcast Support

Yes, that wraps it up the webcast, everybody. Thank you for joining. Special thanks to our speakers, Roee, Aron and John, and remember, this will be recorded and sent out to you shortly. Thank you. Bye.

Roee Shraga

Bye.


Expert Panelist Bios 

+ Aron Ahmadia, PhD
Senior Director, Applied Science at Relativity

Aron is the Senior Director for Applied Science at Relativity, where he leads the development of AI capabilities for Relativity One, including AI-Assisted Review, Analytics, and Search. He is an internationally recognized expert on machine learning and artificial intelligence. His specializations include the use of AI in Legal Discovery, Model Risk Management, and Responsible AI. Aron holds a Ph.D. in Applied Math from Columbia University and a B.S. in Computer Engineering from the Illinois Institute of Technology.


+ Roee Shraga, PhD
Assistant Professor, Computer Science, WPI

Before joining WPI, Roee Shraga was a Postdoctoral fellow at the Khoury College of Computer Science at Northeastern University in Boston. His research mainly revolves around data discovery and integration, and combines techniques from data management, machine learning, information retrieval, and human-in-the-loop. His research has been published in top-tier conferences such as SIGMOD, VLDB, SIGIR, WWW, and ICDE. He is a recipient of the Council for Higher Education [VATAT] scholarship for outstanding data science postdocs. He is also a recipient of several PhD fellowships, including the Leonard and Diane Sherman Interdisciplinary Fellowship (2017), the Daniel Excellence Scholarship (2019), and the Miriam and Aaron Gutwirth Memorial Fellowship (2020).


+ John Brewer (Moderator)
Chief Artificial Intelligence Officer and Chief Data Scientist, HaystackID

As HaystackID’s Chief Artificial Intelligence Officer and Chief Data Scientist, John focuses on bringing the latest advancements in artificial intelligence and data science technologies to the eDiscovery and incident response markets. Having worked with HaystackID since 2015, Brewer has been a software engineer and information technologist for over two decades, with experience spanning dozens of Fortune 500 companies in technology leadership roles, including eDiscovery, data migration, and information stewardship. Brewer has also been at the forefront of new data technology throughout his career. He spearheaded HaystackID’s adoption of large-scale technology-assisted review, positioning the company as one of the first to embrace this transformative approach. Recognizing the need for efficient processing in the face of the aggressive timelines associated with breach work, Brewer played a pivotal role in developing the suite of AI-based machine learning tools that power HaystackID’s review process, including his recent work on Protect Analytics AI™, an AI-enabled platform that can precisely identify and classify a wide array of internationally sensitive data types and entities.


About HaystackID®  

HaystackID® solves complex data challenges related to legal, compliance, regulatory, and cyber requirements. Core offerings include Global Advisory, Cybersecurity, Core Intelligence AI™, and ReviewRight® Global Managed Review, supported by its unified CoreFlex™ service interface. Recognized globally by industry leaders, including Chambers, Gartner, IDC, and Legaltech News, HaystackID helps corporations and legal practices manage data gravity, where information demands action, and workflow gravity, where critical requirements demand coordinated expertise, delivering innovative solutions with a continual focus on security, privacy, and integrity. Learn more at HaystackID.com.

Assisted by GAI and LLM technologies.   

Source: HaystackID