Unknown Unknowns


In 2002, Donald Rumsfeld, then acting US Secretary of Defense, spoke at a Department of Defense news briefing and made the following statement:

“[T]here are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know…it is the latter category that tend to be the difficult ones” (, 2002).

Politics and competency levels aside, Donald Rumsfeld’s unknown unknowns soundbite is a resonating truth within the ever expanding world of analytics.

Specific to legal eDiscovery, there is a need to capture and analyze digital information from every emerging source, such as big data, mobile devices, social media, and the internet of things as litigation spans across all business verticals.

Furthermore, the essence of discovery is to unfold the facts and gather information relevant to any party’s claim or defense per Federal Rules of Civil Procedure (FRCP) Rule 26(b)(1). However, in today’s arms race by eDiscovery software vendors, the use case for objective understanding, analysis, and research is given a backseat to the large-scale reduction and grouping of datasets for exporting purposes.

Single Point of Focus

The RAND group recently published a white paper, Where the Money Goes, exploring the rising costs of managing ESI within the Request for Production portion of discovery.  The study breaks down further segments of eDiscovery to expose areas of the process (collection, processing and review) where there may be room for improvement but notes there are technical and human limitations that exist, hindering real cost enhancements.  $0.73 of every dollar spent on eDiscovery efforts goes towards the document review process.

In today’s legal eDiscovery world, there are many companies attempting to capture their slice of the billion-dollar industry.  By 2017, the market is expected to reach $9.9 billion USD per Transparency Market Research. With no surprise to anyone, the majority of these companies are focused on capturing revenue within the document review space and all tend to provide similar methodologies.  There seems to be no technological advancement for true data analysis to assist lawyers in preparation for settlement negotiations or trial.

Oddly to me, this majority is focused solely on providing robust tools for viewing a document and reducing the volume of data to be produced.  Production is the legal shorthand for delivering specific internal documents and information (discovered material) to the opposing party that is deemed relevant to the matter. FRCP Rule 26(b)(1) states, “Parties may obtain discovery regarding any non-privileged matter that is relevant to any party’s claim or defense.”

Why do I find this odd? Because the entire space is focused on using these technologies to “shake out” data that firms collect from their clients.  Yet, there is no emphasis on the power of finding the unknown unknowns hiding within data received from the client or other party!

Yes, finding relevant documents for production while holding back any data that may be privileged/work product is definitely part of the law firm’s job to protect and represent its client.  And, as RAND noted, the document review process needs further emphasis on cost reduction, yet software rarely provides more savings than what the industry has commoditized. No doubt, there is a need to utilize technology for the sole purpose of review to production.

Nonetheless, knowing what information is buried within the client’s data, and more importantly, what is hidden within data being produced from the other side, is a huge part of properly representing a client.  This concept is almost completely overlooked by the entire discovery industry serving law firms.

Forest for the Trees: The Review to Produce Mentality

Many questions come to mind when thinking of reviewing data solely for the purpose of production, contrasted with reviewing data for the purpose of knowing your case.

  • What insights are being missed when massive data sets are analyzed by machines, reducing smaller sets of information for humans to review?
  • What knowledge is lost by not examining and analyzing the data?
  • What opportunities were lost for negations or settlement talks prior to expending large amounts of time and budget focused on the review to production?
  • Are you able to gain any useful insights from the data post-review without re-reviewing?

Shifting Focus To Unknown Unknowns 


There is more to the legal discovery process than producing records. The ABA’s Model Rules of Conduct Rule 9 states it’s the “lawyer’s obligation zealously to protect and pursue a client’s legitimate interests”. How is this truly being performed if the majority of datasets are never analyzed for facts before a non-collaborating team of single focused reviewers and an algorithm flag records for production?

Maybe we should reinvent the term Early Case Assessment to Early Case Analysis.  Let’s place the focus on the entire dataset, whether it is your client’s data or performing analysis within incoming sets of data.  I’m not suggesting that one should ignore the power of de-nisting, filtering for your date range, applying threading and deduplication to cluster the end points, or using keywords to find pockets of information.  My point is simple – dive into the data and zealously research the facts.  Assess the risks associated with moving towards negotiation, settlement, or to trial.

Prior to the days of discovery with an “e,” we would not enter a conference room, a.k.a. war room, full of boxes only to flag records to endorse, copy, and then produce.  The process was much more geared toward understanding what was literally dumped at our feet, weed out the fluff, and begin sorting.  Sorting took on many meanings, from the “this looks interesting” pile to a “we need to keep all these memos over here,” and the entire process was collaborative.

In many cases, the entire team was informed as a particular file was identified, and a discussion took place as to the meaning.  Sometimes, finding one single document changed the entire sorting process, and many of us would scramble back to existing stacks looking for the relationships.  We were mapping the facts…exploring the unknowns…building the story.

Can this be done with today’s volume of ESI?  Of course it can and, to be fair, some firms still follow the same collaborative and analytical methodology as the war room paper review.  However, this does not appear to be the norm across the board, and by employing sorting algorithms coupled with silo’ed, single-purpose reviewers, the art of learning the material collected is lost.

Getting Back to the Basics


I recently had a refreshing conversation with a progressively thinking litigation support manager.  He not only agreed there needs to be a shift in industry thinking, but pointed out tools that are currently being utilized within the security and intrusion world.  These concept connectivity tools aren’t used to “review” but, rather, are used to search for the unknowns to piece together the story.

Forensic investigators utilize the concept connectivity tools to search for similarities in data where, at face value, none exist.  The process starts by tying together pieces of information from one source to another.  As the connections continue to build, so does the story that was not available via simple keyword searches.

By starting off with something as basic as a first and last name, these diligent investigators begin to connect to online sources, attaching a name to a social media profile such as Twitter.  The Twitter handle’s profile photo may then be used to scour online messaging boards connecting to a user posting comments with the same avatar/account name.  Further concept connectivity leads to the unraveling of the account holder’s IP address which happens to be the same as the “anonymous post” made by the suspect detailing the crime.  Dots connected!

The process employed by forensic investigators is reviewing data to produce.  It was analyzing data to create a story that was otherwise an unknown unknown.

The story exists, no matter how big the “Discovery” becomes.  Keep an open mind when approaching incoming data, regardless of source, use analytics to explore the unknowns, gain valuable insights for negation purposes, and tell the story!

According to “The Law Dictionary” about 95 percent of pending lawsuits end in a pre-trial settlement.” The most powerful weapon in the pre-trial arsenal is one’s ability to craft the story by flipping the unknown unknowns into known facts.

But how? View discovery with child-like wonderment instead of as a monumental task.  People “who are open to learning from analytics may find unexpected results to be more valuable than the expected” (MIT Sloan, 2015).  Explore the advanced technology features that currently exist to do something other than simply review to produce a load file with endorsed images.

Next generation discovery tools, such as Cicayda’s Reprise, introduce the natural language processing in real-time, allowing one to have immediate access to facts and people within datasets.

Exposing Key Items with Natural Language Processing

Natural Language Processing (NLP) is something that is relatively new to the eDiscovery world but has been utilized in other areas for years, dating as far back as the 1940s and as recent as IBM’s Watson & Apple’s Siri.

Originally built around language translation, NLP has evolved into a system that can detect named entities, make inferences, and even create linguistic trees.   Although the utilization of this technology is still within its infancy for use in discovery platforms, the toolset can provide advantages for establishing a targeted review, as well as uncovering the unknown unknowns.

Knowing that one can have real-time access to top people, places, and things buried within terabytes of data provides a unique advantage to anyone using Reprise.  NLP, coupled with Google-like speed, allows you to conduct searches across massive datasets and grants unparalleled access to case data once thought to be too much for any platform to handle.  You need to tell the story, and we can help uncover the facts.

The task is the same as it’s always been; however, Cicayda has the powerfully cool tools to better conquer the task.  Get back to the basics, extend the use case of the review tool, and harness the true power of non-biased analytics to tell the story.

What you don’t know, now you know.

Originally posted on

Leave a Reply

Your email address will not be published. Required fields are marked *