Expert Details How Businesses Are Considering AI for Document Review

Editor’s Note: This article delves into an emerging technology trend that could significantly impact eDiscovery practitioners over the next few years. As artificial intelligence advances, new “generative AI” systems show real promise in automating document review processes that still require extensive human effort today. Understanding capabilities and limitations around applying AI for assessing document relevance, privilege, and responsiveness is crucial for eDiscovery leaders shaping future service strategies and offerings. This overview provides an insightful reference point for leaders seeking to track this fast-moving space while weighing considerations around utilization now versus later. Though some hype precedes practitioner-ready reality in some areas, concrete use cases already exist, and competitive forces may compel faster adoption. We recommend readers analyze key takeaways against organizational needs and priorities to determine if and where experimentation now could yield an advantage.

Expert Details How Businesses Are Considering AI for Document Review

By HaystackID Staff*

Rapid advances in artificial intelligence during 2023 have opened new opportunities for businesses to use AI systems to efficiently review legal documents, contracts, emails, and other text-based files, according to AI expert John Brewer, chief artificial intelligence officer at eDiscovery provider HaystackID.

In a wide-ranging message, Brewer said a new class of AI called “generative AI” had a breakout year in 2023. Unlike previous AI technologies, generative systems can understand plain English questions entered via text and then respond with thoughtful, coherent answers in everyday written English.

“For the first time, average users could ask a question in plain English to a computer by typing it in, and the computer would respond in plain English,” Brewer said. “The experience for the user truly felt like Artificial Intelligence in a way I don’t think any previous technology has.”

This technology presents new opportunities for businesses to automate time-intensive tasks like reviewing documents for legal matters, internal investigations, and contract management. This automation includes determining if documents are relevant to a particular case, flagging those needing legal privilege review, and applying tags or other metadata to aid future searches.

Brewer laid out three main approaches companies are taking to apply generative AI to document review and analysis:


This method involves carefully instructing the AI system on evaluation criteria and feeding it the documents requiring analysis. For example, to support a document relevance review for a legal matter, the business would start by explaining the specifics of the case, the key parties involved, the dates, and other details to provide context.

Next, detailed instructions would be given on how to determine relevance, such as particular topics, inclusion of names, timeframes, or geographies. Finally, the full text of each document passes into the system for analysis against the predefined criteria.

“Essentially, when a product using an LLM (large language model), they interact with it the same way you might as a starting ChatGPT user; they give it a bunch of text, and then they get a response back,” Brewer said, referring to AI chatbot ChatGPT, which processes text inputs and formulates natural language responses.

Using this approach, the AI reviews each document individually and returns straightforward decisions, such as marking a document “responsive” or “non-responsive” for legal proceedings, along with a short explanatory answer for its judgment.

Brewer said retrieval augmented generation works well thanks to new generative AI systems with sizable “context windows” enabling them to remember the equivalent of around 200 pages of information. This allows the AI to keep intricate evaluation instructions, document context, and full document texts in active memory while reviewing each file.

But costs scale quickly when applying the technology across hundreds of thousands or millions of documents. According to Brewer’s estimates, using an advanced AI system to review 517,000 emails could incur $40,000 or more in processing costs. Limiting use to shorter documents that fit within the context window of less sophisticated AI services can potentially reduce this to $4,000.


An alternative approach is to customize or “fine-tune” an AI system by loading large volumes of sample documents so the system intimately learns the specifics of a topic. Benefits include being able to ask questions that span content across many documents rather than entering each file individually.

“With Fine Tuning, we don’t necessarily have to pass in the document we’re asking about, and we don’t even need to ask about a specific document. We can ask about a custodian and get answers back that span across a number of documents. That sounds amazing!” Brewer said.

But the downside is the computing resources required drive up costs. While using a retrieval augmented generation approach on 367 million words of text might cost between $500 to $3,000, fine-tuning a custom AI system could require hundreds of times more processing capability.

There’s also a risk that an AI system that is hyper-trained on documents like legal complaints may have trouble separating factual information from disputed claims made in those documents. More broadly, even advanced AI systems have difficulty tracing back to original sources, potentially leading to fabricated answers when lacking information to respond accurately.


Hybrid approaches aim to balance the strengths and limitations of both methods by using generative AI to analyze samples of documents. Insights from that activity are then used as training to build machine learning algorithms customized to the document collection. Those more basic algorithms can then rapidly process millions of documents in bulk while costing far less per file.

“The idea here is that if the Generative AI is expensive to run, let’s have it train the older, faster models and get the best of both worlds,” Brewer explained.

In an ideal scenario, generative AI kickstarts effective algorithms that are then let loose to ingest huge volumes of files autonomously. But because information and subtleties get lost translating understanding between systems, the resultant algorithms will never fully match the depth of the AI systems they’re modeled on.

For document review tasks like tagging or determining privilege assertions that involve some level of interpretation needing human-equivalent discernment, degraded model alignment could noticeably impact accuracy. Nonetheless, Brewer sees promise in using generative AI to aid bootstrapping wider adoption of document-processing AI technologies.


Thinking ahead to 2024, Brewer expects generative AI capabilities to rapidly improve alongside dropping costs. He anticipates many software tools focused on eDiscovery and document analytics will integrate generative AI to enhance existing capabilities. Some newer entrants may even build fresh document processing solutions fully around generative AI technologies.

However, accurately applying AI document review requires carefully considering tradeoffs around factors ranging from overall spending to legal defensibility to end-result quality. For example, what level of accuracy is truly needed to justify genuine productivity gains and cost savings? How will resultant data get used in sensitive contexts like legal proceedings or regulatory investigations?

Issues around properly tuning AI systems and retaining the explainability of AI decisions also continue to pose barriers to adoption for many organizations. As Brewer noted, tracing back sources and explanations for generative AI responses remains tricky compared to traditional search and analytics tools. Blindly trusting outputs absent context risks propagating errors or introducing bias.

For businesses intrigued by productivity promises but cautious about risks, Brewer suggests: “Choosing not to use it because it doesn’t do what you want or introduces intolerable risks may well be the best choice. Ignoring it entirely because of a vague sense it has no use is a mistake, though.”

Establishing thoughtful governance while closely tracking evolving capabilities can help organizations make informed choices on if and how to pilot opportunities. Moving too fast risks learning hard lessons through organizational disruption. But delaying too long cedes competitive advantages to fast followers willing to accept some lack of perfection in exchange for other benefits AI augmentation enables.

The key is framing decisions around genuine organizational needs, not reacting to vendor hype cycles. As Brewer summarized: “You owe it to your client and your organization to review the capabilities out there and make an active choice about how you’re going to handle the Generative AI Generation.”

News Sources

*Published with permission. Assisted by GAI and LLM technologies.

Source: ComplexDiscovery

About HaystackID

HaystackID is a specialized eDiscovery services firm that supports law firms and corporate legal departments and has increased its offerings and expanded with five acquisitions since 2018. Its core offerings now include Global Advisory, Discovery Intelligence, HaystackID Core®, and artificial intelligence-enhanced Global Managed Review services powered by ReviewRight®. The company has achieved ISO 27001 compliance and completed a SOC 2 Type 2 audit for all five trust services criteria for the third year in a row. Repeatedly recognized as a trusted service provider by prestigious publishers such as Chambers, Gartner, IDC, and The National Law Journal, HaystackID implements innovative cyber discovery services, enterprise solutions, and legal discovery offerings to leading companies across North America and Europe, all while providing best-in-class customer service and prioritizing security, privacy, and integrity. For more information about its suite of services, including programs and solutions for unique legal enterprise needs, please visit