Tangled Tongues and Translated Text: How do I make sure my document review platform and my reviewers speak the same language?

“You’re about to receive a production from the opposing side, and they have offices at non-English-speaking locations – or maybe you’re about to try to review the internal correspondence of your company or client’s company.  What can be done about it?  Who do you turn to?

The first thing you may ask yourself is whether you know any vendors that can handle international collections and international data.  If your vendor has kCura’s Relativity, the answer is yes, for any language with the exception of possibly Sanskrit – no, I take that back Sanskrit is covered, so is, um, Hittite.  Yes, Hittite — so if part of your collection is in cuneiform clay tablets, we will scan it and OCR it (probably with the flat-bed, not the duplex feeder).  Finding the Hittite-speaking attorney is up to you (or Ancient Egyptian, which is, shockingly, also supported.)  For certain vendors who claim that they are the only ones who can handle non-Western languages, they have discounted the majority of their competitors.

The reason for the comprehensive coverage is that Relativity supports Unicode data and Unicode characters through much of the user interface.  Unicode doesn’t support languages so much as scripts, which are the character sets that support most of the myriad languages on Earth.  Amazingly, Unicode even includes the scripts of long-dead languages, such as cuneiform.

You may also want to ask, just how deeply can we use these non-Western character sets in our review?  Meaning, can I make tags or Choices in these languages?  Can I comment, or fill redactions in these languages?  Can I capture foreign language information in foreign-language named fields?  Can I name the issue categories and production status indicators in foreign languages?  What about searching?  Can I search in a foreign language (I better be able to!).  What about predictive coding, analytics, near duplicate detection email threading?  Will these work with foreign languages?

It depends upon the technology and the review platform, but kCura’s Relativity certainly can, and it’s important to have a vendor who makes sure that it is set up properly to handle these languages.  There are settings for the keyword index, the fields, the tags, the views, the coding layouts and other aspects of the review platform that need to be “Unicode compliant.”  It also helps if your vendor has some foreign language background so that they have some intuition as to what they are doing beyond the instruction manual.

What about predictive coding and analytics?  Relativity Assisted Review and the Content Analyst engine upon which it is based are totally “language agnostic,” they do not care what the documents say on their own, just what they say in relation to each other.  This has some interesting advantages, such as the fact that documents from similar languages will natural be grouped together as similar documents (more similar to each other, than to the rest of the database).  Relativity can also name the primary and secondary languages found in every document that you index, allowing you to divide the database up and distribute just the foreign language documents to the attorneys with the requisite reading comprehension.  Document translation services works very well with Relativity, because multiple translations can be loaded per document.

Be cautious, however, about the quality of translation and legal review.  Machine translation, or fully automated computer translation, is still not particularly accurate, particularly between so-called “CJK” languages (Chinese, Japanese and Korean).  To make matters worse, attorneys had to learn some very specific and sometimes bizarre words and meanings.  For instance Tortfeasor, the name of our law school band (it was a short moment of fame), and “fee tail” (which has absolutely nothing to do with fees or tails as we know them).  This means that whatever inaccuracies there are in the translation will be magnified by the attorney’s comprehension issues, amplify those review choices with predictive coding, and you are likely to have a very poor review.  Also, proper human-edited or full human translation is expensive enough, that it is quite often less expensive to go find attorneys with language skills than it would be to accurately translate, and then submit documents to an attorney.

Probably the best plan is to use a review platform that full supports Unicode and find some reviewers who fully support their respective languages, after running language identification to identify their assignments.”