When the Data Knows More Than You Do: The New Realities of Cross-Border Discovery

Editor’s Note: Cross-border discovery is often discussed in terms of regulations, transfer restrictions, and privacy laws, but the practical challenges are frequently much more nuanced. As organizations face growing pressure to respond to cyber incidents, regulatory inquiries, and litigation across multiple jurisdictions, understanding where data resides, how it is governed, and what obligations attach to it has become increasingly important. This article draws on insights from the EDRM workshop, “Discovery at a Crossroads: Global Perspectives on Emerging Challenges,” which featured practitioners working at the intersection of privacy, cybersecurity, investigations, and eDiscovery. Their discussion highlights how cultural differences, regulatory expectations, operational preparedness, and emerging technologies such as AI are reshaping cross-border discovery strategies. The panelists emphasized the growing importance of local expertise when handling jurisdiction-specific requirements that can materially affect investigations and response efforts. Ultimately, the conversation reinforces a simple reality: organizations that understand their data before a crisis occurs are better positioned to respond when time-sensitive demands arise.


When the Data Knows More Than You Do: The New Realities of Cross-Border Discovery

By HaystackID Staff

Cross-border discovery has always been complicated. But the complications used to be legible: blocked statutes, transfer restrictions, language barriers. Today, the gaps are subtler, and for organizations managing global data, the cost of missing them is climbing fast.

During a cyber incident investigation spanning multiple jurisdictions, FTI Consulting’s Inés Rubio watched an AI tool flag a single field in an Excel spreadsheet, an obscure local acronym, and identify it as a national identifier tied to a Latin American country. Nobody on the team knew that country was even part of the breach. Traditional keyword searches, the kind built around “social security number” and familiar US data types, would have passed right over it.

“I actually Googled it afterward during my validation of this output, and the AI was absolutely correct,” she recounted.

Rubio shared that moment during the recent EDRM workshop “Discovery at a Crossroads: Global Perspectives on Emerging Challenges” as a vivid illustration of how AI, deployed thoughtfully in the early, lower-risk phases of data categorization, can surface what human reviewers simply don’t know to look for, and why that distinction matters enormously when breach notification deadlines are already counting down. Without it, that country’s data subjects would have gone unnotified, and a regulator would have been left in the dark.

The Philosophy Clash Nobody Warned You About

Before you can appreciate why cross-border discovery is so operationally difficult, you have to understand that the US and Europe aren’t just operating under different rules; they’re operating from different premises.

Jeff Shapiro, managing director for Europe at HaystackID® and a licensed US attorney based in London since 2013, learned this firsthand early in his career.

“The US philosophy of discovery fundamentally clashes with European views on privacy,” he said. “In the US, the default operational posture is generally to collect broadly and cull later. And there’s an assumption that if a judge signs an order, the data moves.”

Shapiro’s first real encounter with the French Blocking Statute, during an IP matter in Virginia, upended that assumption. A US subpoena, it turns out, does not override a country’s fundamental right to privacy. That realization permanently changed how Shapiro approaches matters.

“I no longer think about exporting data to review it. Instead, I think about how we deploy technology and review, to where the data is. You want a process in place, utilize local data centers, strip out personal data before it ever crosses an ocean or a border,” he said.

The shift in thinking is more than procedural. It requires accepting that the data’s location and the rights attached to it take precedence over litigation convenience. Martin Nikel, a Brussels-based data practitioner with more than 20 years of cross-border eDiscovery experience, pushed back slightly on the reputation that European data laws have acquired.

“My personal view, many may not share this, is that a lot more is made of it than there needs to be,” he said. “When you look at the French blocking statute, it has obviously been invoked in some cases. Many of those go unreported. It’s various warrants and requests that you won’t necessarily see all of them in the news, but it’s much less common than people might think.”

What matters, Nikel argues, is proportionality; demonstrating that the data requested is relevant, that its transfer is legally justified, and that it’s been minimized appropriately.

“Nobody in Europe is anti the practice of business and anti the practice of seeing a litigation performed fairly or responding to things as needed, but it’s always done with this proportionality in mind,” he said.

The principle holds across the Atlantic, too. Mary Mack, EDRM’s CEO and Chief Legal Technologist, recalled early cross-border matters where US attorneys confidently briefed their European counterparts on privilege, only to discover that in-house privilege carries virtually no weight in many European jurisdictions.

“We had some attorneys school the EU attorneys in, ‘Oh, it’s just fine. Don’t worry. We’re going to review for privilege and things like that.’ And privilege is something that is jurisdiction by jurisdiction, and they had it so, so wrong,” Mack said, adding that new personnel had to fly in to rebuild relationships that the misstep had damaged.

72 Hours and Counting

Nothing clarifies a cross-border data problem faster than a 72-hour clock. Across Europe, the standard window is 72 hours after becoming aware of a qualifying personal data breach. In some jurisdictions, it’s shorter. That window doesn’t care how complex the data set is, how many countries are involved, or how confident you are in the data you’ve found so far.

Rubio described a phased approach that she applies in these situations.

  • Begin with an initial assessment of what you actually have.
  • Identify which regulatory bodies need notification first.
  • Separate that regulatory notification from the eventual individual notification, because those are two very different processes on very different timelines.

“The notification to the regulator must be transparent. As much as you can provide at that time, you keep them informed as the investigation carries on,” she said, “but ultimately, there is a separation between those two types of notifications.”

Individual notifications take longer because they require greater precision than the regulatory filing does. A data subject notification must tell a specific person exactly which information about them was exposed, which means untangling identity across documents, languages, and cultural naming conventions that don’t conform to Western document review workflows.

Rubio pointed to cultural naming conventions as one example of the challenges involved in preparing individual notifications. She noted that identity matching can become significantly more difficult when names appear differently across records due to local naming practices.

“In the Philippines, they have a lot of last names as well, and they may not always appear in the same order in different documents and things like that. That creates an extra layer of complexity, which is, again, a cultural one, but it trickles down into how you actually create that final notification,” she explained.

Similar challenges can arise across jurisdictions where naming conventions and record structures do not align neatly with Western document review workflows. Identifying that a document, a form, and a breach of record all refer to the same individual—when their name appears differently in each—is not a trivial problem at scale.

Nikel echoed the point with a more common scenario.

“I’ll often see lawyers send me a spreadsheet of keywords that they want to search for, and it will just say, ‘Social security number.’ And of course, that doesn’t really apply in a country such as France or Belgium,” he said.

The assumption baked into those keyword lists—that the US vocabulary for sensitive data applies everywhere—gets missed precisely because no one knows to question it.

AI at the Right Altitude

There’s a meaningful distinction between using AI to identify data categories and using it to draft breach notifications. Rubio is emphatic about this difference.

“I usually suggest to clients or anyone I’m speaking with using AI at a higher level in a low-risk environment, and then maybe at a higher, more scrutinized level if you’re going to use it for a higher-risk use,” she said.

In the early phases of a breach investigation, AI performs well at data categorization, and the stakes of imprecision remain manageable.

“It doesn’t have to be perfect in terms of categorization. It may be slightly off or over-inclusive, but as long as you caveat that, that is a great starting point to get speed,” Rubio said.

But her endorsement comes with explicit conditions.

“I’ve seen GenAI work really well in particular data sets and not very well at all in other data sets. So that’s a big thing, too. It’s making sure that you’re running your sampling and you’re happy with the output, and a human has actually confirmed, yeah, this is actually working properly as we expect it to,” she said.

The emphasis on validation and human oversight reflects a broader reality emerging across Europe. As discussed during the Dublin Tech Summit, organizations adopting AI are increasingly finding that governance obligations are being driven not only by AI-specific regulations but also by existing privacy frameworks.

As Dr. Barry Scannell observed, “The law most likely to govern day-to-day AI use … is not the EU AI Act but the General Data Protection Regulation.” That perspective reinforces why data governance and defensible workflows remain central to AI adoption in legal, regulatory, and investigative settings.

Shapiro extended the argument to the upcoming EU E-Evidence framework, which introduces two distinct response windows, as little as eight hours in urgent cases, and 10 days for standard requests, when it goes into effect in August 2026.

“Whether it’s the 72-hour cyber incident response timeline in Europe or the potential eight-hour or 10-day timeline that’s coming in August 2026 under the EU E-Evidence framework for criminal prosecutions, those are extremely tight timelines,” he said. “How are you going to be able to respond to that request within those timelines without the use of technology to be able to rapidly sift through your data and figure out the data subjects, what happened when, and how do you respond?”

The panelists suggested that increasingly compressed regulatory timelines may strengthen the case for localized AI capabilities as part of broader data readiness and response strategies.

Boots, Preparation, and the Works Council

Cross-border discovery creates operational risk that no amount of remote legal expertise can fully mitigate. Nikel recalled a forensic collection that went sideways not because of a legal barrier, but because no one had adequately prepared for the local environment. A colleague arrived at a subsidiary office, began copying data, and by midday had local police at the door. The works council—the local employee representative body—had been informed by employees that an unidentified outsider was copying their data and called the police.

“It was all resolved very quickly, but the situation was particularly fraught for the forensic individual who’d gone on site to collect the data, really because nobody had done their preparation in advance of understanding how that situation might be perceived,” Nikel said.

That story isn’t a cautionary tale about European obstructionism. It’s a cautionary tale about preparation, and specifically about the assumption that discovery workflows from one jurisdiction transfer cleanly to another. Works councils and similar employee representative bodies can create real operational considerations in jurisdictions such as Germany and France. Employee email may be treated as personal under local works agreements. In-house privilege, well-established in the US, carries virtually no legal weight in many European jurisdictions, a gap that has blindsided US counsel more than once when they’ve told their European counterparts to proceed without concern.

The solution isn’t to treat local law as an obstacle to working around. It’s to build local knowledge before you need it.

Reframing the Calculus

There’s a temptation to treat cross-border data compliance as a cost of doing global business, a friction layer that slows things down and forces additional spend without producing any direct benefit. Shapiro pushed back on that framing directly.

Shapiro argues that the infrastructure required to respond to an eight-hour E-Evidence request—knowing where your data lives, what it contains, and whether it can be transferred—is the same infrastructure that makes data useful as a business asset. An organization that can locate and classify its data fast enough to satisfy a regulator can also mine that data for competitive intelligence, build market analysis, and move faster than competitors who are still figuring out what they have.

The discipline required to know your data well enough to respond to a regulatory request within hours is the same discipline that makes your data a valuable business asset.

The distinction matters because it changes how organizations prioritize the work. Compliance obligations that feel like overhead tend to get underfunded and underprepared. The same investment, reframed as a competitive capability, is resourced differently.

Cross-border discovery is challenging. But as the panelists demonstrated across this workshop, the difficulty is almost always concentrated in the preparation phase, not the execution, in the work of knowing your data’s geography before a regulator or opposing counsel forces the question. Organizations that put that work in early, with the right local expertise and the right technology deployed at the right risk tolerance, are the ones that don’t get surprised by what’s already in their data.


HaystackID® solves complex data challenges related to legal, compliance, regulatory, and cyber requirements. Core offerings include Global Advisory, Cybersecurity, Core Intelligence AI™, and ReviewRight® Global Managed Review, supported by its unified CoreFlex™ service interface and eDiscovery AI™ technology. Recognized globally by industry leaders, including Chambers, Gartner, IDC, and Legaltech News, HaystackID helps corporations and legal practices manage data gravity, where information demands action, and workflow gravity, where critical requirements demand coordinated expertise, delivering innovative solutions with a continual focus on security, privacy, and integrity. Learn more at HaystackID.com.

Assisted by GAI and LLM technologies.

SOURCE: HaystackID