Image Alt

The Singapore Law Gazette

Avoiding eDiscovery Icebergs

There is a feeling in the legal world that fundamental change is something like turning the Titanic. Many of us believe deep down that, no matter what happens, most things will be the same five or even 10 years hence. Maybe the names will change (a little) but workflows, processes and tools will largely stay as they are.

This perception stands in stark contrast to the trends and realities of legal discovery today. There isn’t just one iceberg looming on the horizon; there are several, and any one of them could stand in the way of success for your firm.

The Information Landscape is Changing

There are many ways to look at how information is evolving:

  • Data volumes are growing;
  • Data sources are becoming more complex;
  • Social media continues to expand, especially thanks to mobile devices; and
  • Cloud storage is more ubiquitous and accepted, even in heavily regulated industries.

This all leads to an inescapable truth. The tools of yesteryear, while they might still work today, are showing their age. It is getting harder to conduct discovery using technology that is falling behind the times.

Our customers have told us that it is extremely important to use an eDiscovery platform that can handle large data volumes at speed and without failing. However, it is equally important that the platform be easy to use, learn, deploy, grow and manage.

Oh, and did I mention they also need a platform that includes advanced analytics and predictive coding as a core component, not as an add-on?

Understand, Not Just Produce

With these challenges at hand, eDiscovery practitioners are also clamouring for greater understanding derived from the information at hand. There isn’t enough time in the day to do discovery the traditional way –people are overworked, with little time in the day to sift through mounds of information with anything resembling inefficiency.

To meet these demands, eDiscovery software needs six fundamental capabilities.

1. It Can Process Anything from Anywhere

In a world filled with complex new data formats, where all kinds of everyday devices can hold digital evidence that may be discoverable, discovery software must be able to process and provide access to data stored in a huge variety of devices, locations, and formats. Ideally this should encompass 10 dimensions of data:

  • Human-generated content
  • Multimedia
  • Digital and mobile forensic data
  • Network data
  • Log data
  • User data
  • Communication patterns
  • Structured data
  • Enterprise and cloud repositories
  • Real-time feeds

A best-in-class discovery platform can handle all these types of data without having to rely on additional tools that add complexity and potential for error into the workflow. And it needs to account for each item forensically, processing each item and keeping a tally of any exceptions.

2. It Can Handle Legacy Data

Most organisations have at least some data in legacy systems – older software and systems they no longer use or only use occasionally. This is particularly an issue for organisations that have gone through mergers and acquisitions, where they have inherited data and migrated it to a new system or just left it in place.

Whatever the case, this data may still be 100% relevant for litigation. eDiscovery software must be able to tap into information from a wide variety of legacy systems and outdated formats. Lotus Notes and obsolete e-mail archives are just the start of ancient and obscure data types that can be discoverable.

3. It Gives You Answers Quickly

How large is the average eDiscovery case? It can be as hard to estimate as the length of a piece of string. For example, Jay Brudz, chair of information governance and eDiscovery at law firm Drinker Biddle, said a typical case in 2016 involved 10–15 custodians and 5–20 gigabytes of data for each one.1 Jay Brudz, eDiscovery and Data Analytics, History and Current Legal Applications,

The type of case, the circumstances, the jurisdiction, and even the Judge’s rulings during the case can make the data volumes vary significantly. Even a relatively small 50 gigabyte matter can be troublesome depending on the type of data and the capabilities of your eDiscovery software.

However much data you have, your software’s ability to give you insights quickly is vital to setting a winning case strategy. The faster you can process, cull and analyse data, the better informed you will be when it comes to deciding the facts and merits of your case. And when you are analysing production from opposing counsel or answering a request from a judge or regulator, you don’t want to go back and ask for more time because your software isn’t up to the task.

Your eDiscovery software should provide scalable parallel processing, so you can work your way through large data volumes at great speed. You should easily be able to add processing resources to get through extra-large or super-urgent matters. And it should provide ways to automate your workflow to maximise the productivity of your digital and human resources.

4. It Gives You Many Ways to Cull Data for Review

It is impossible, inefficient, and expensive to review every single document involved in a discovery case. In a 50-gigabyte discovery that would add up to somewhere around 175,000 documents or spreadsheets, 35,000 PowerPoint slide decks, 250,000 e-mails, or many millions of text messages.

eDiscovery software should give you a variety of ways to cull irrelevant items from your document set across many stages of the discovery process. While doing this, it should give you confidence that you haven’t eliminated relevant or material items.

This begins with information governance – proactively retiring trivial, redundant, abandoned, superseded and harmful data that an organisation does not need to keep. Then through the processing and analysis stages using deduplication, removing known irrelevant files such as system files and applications (de-NISTing), keyword searches, date ranges, and a variety of other filtering techniques. And finally through the review process with the use of predictive coding, document clustering and other analytics. Culling data shouldn’t slow you down. If you are spending far too much time culling your data, you are not using a best-in-class discovery platform.

5. It Allows You to Do Everything in a Seamless Workflow

To maintain a consistent process, discovery practitioners have used a variety of approaches: writing detailed process manuals for technicians to follow, writing custom scripts or applications to string tasks together, using third-party applications to manage the workflow, or just sitting around staring at the progress bar. All these approaches have hidden costs in time, money, inconsistency, and potential for errors.

With the right technology, discovery practitioners can automate their discovery workflows while maintaining a consistent and defensible process. Automation saves operators from the drudgery of waiting around for one process to finish so they can kick off the next one.

6. It Provides End-to-end Coverage of the Discovery Process

Some organisations have separate tools for collections, processing, early case assessment and review – and in some cases multiple tools for processing different kinds of data. This scenario causes inefficiency and makes it hard to maintain chain of custody.

Moving data from one system to another is expensive and time consuming and opens the door for data or metadata to be lost or corrupted. In addition, every time you copy from one tool to another you are duplicating data, which takes up storage space. If you are handling sensitive data, it potentially creates additional risk. It might be hacked or inadvertently exposed and makes for more work when you need to dispose of it.

A modern discovery platform can handle all the data sources and all the stages of the eDiscovery process. It minimises the number of handovers and the need to convert data from one format to another. And it makes it easier for you to maintain chain of custody and defend your processes to a court or regulator.

Building Efficient Review

Traditionally, eDiscovery teams applied linear review for their cases’ legal and strategic needs. With this approach, legal teams would often need to derive key terms for searches to produce the documents to review while only having superficial knowledge of the matter. Review teams would then proceed in the dark, with each document reviewed in isolation from its larger context until the completion of document review.

To transform your linear review process into a rock-solid, repeatable workflow, you need to adapt the Smart Review approach: deploying visualisation tools and analytics within a single platform to empower your eDiscovery teams to find the facts you need while eliminating the waste in the review process.

Visual cues can help focus the eye on what is most important, allowing you to quickly identify and prioritise key documents and terms for review. By leveraging visual analytics, content analytics, concept clustering, metadata metrics, and data patterns, you can quickly get a sense of what you are seeing without having to look at every single document.

Here are some examples of visualisation tools that support you to identify relationships and make connections faster and easier.

Concept Clustering

Concept clustering is designed to accelerate the review and analysis of large document sets, transforming disparate, unstructured content – such as -emails, documents and social media content – into a highly organised index. This index can then tell the story through interactive visualisation, rather than your reviewers trying to decipher it.

Concept clustering begins with content analytics during the data ingestion phase of eDiscovery. During this process, the software analyses and classifies all incoming content to identify key features in every document. For example, it can:

  • Extract nouns and noun phrases from all e-mail, documents and social media content;
  • Identify concepts and weight them based on frequency within the documents;
  • Identify overlapping concepts; and
  • Score and assign clusters based on similarity.

Once concept clustering has occurred, you can then visualise search results by clustering conceptually similar documents together – creating a micro view of how they are conceptually related. In Nuix Ringtail, these clusters are presented as a spiral of dots, with each dot representing a unique document. The document with the most common concept is at the centre. At a macro level, multiple clusters sharing dominant concepts are arranged in spines. The entire visualisation of the map depicts the results of tens to hundreds of parallel concept searches – so you can easily see what is related, where discrepancies arise and who is talking to who, to name a few capabilities.

Social Networking View

While documents are an important source of information for litigation, investigations begin by understanding who is communicating with whom about particular topics.

Social network views depict the relationships between people and organisations along with the flow of information – such as e-mails, phone calls, chats, and financial records. The structured metadata of e-mails (from, to, CC, BCC, date, time) provide the organising information to show three types of communication relationships: people, organisation, and people to organisation.

With Nuix Ringtail, you can make these connections easily and visually with a concept cloud that shows the concepts discussed in these different types of communications. You can view the communicators involved to see connection between them – and even concepts that individuals discussed together.

Predictive Coding

Predictive coding uses machine learning to analyse a sample set of documents that a human being has already coded – for example as relevant or irrelevant, privileged or not privileged. It then applies the resulting knowledge to very large sets of documents that have not yet been coded.

A recent and exciting advance in predictive coding is Continuous Active Learning (CAL), which makes the process quick and easy for reviewers of any skill level. CAL works by learning as each reviewer codes documents, using those decisions to optimise and train the predictive coding model.

CAL identifies relevant documents earlier in the review process than traditional predictive coding. You can add new documents to the review set as you receive them, rather than having to wait for all the documents before you start. Even with multiple reviewers working simultaneously, CAL incorporates all their decisions to continuously fine-tune the model and identify more and more relevant documents.

E-mail Threading

E-mail threading helps reviewers parse e-mail conversations without having to read duplicate passages of text, making the entire process more efficient. The threading process works by analysing the body, attachments, recipients, and subject lines of e-mails to more accurately group e-mails into threads and to identify the most inclusive, or pivot, e-mails.

Using a threading view, each successive message in a chain is indented further. The thread data column allows reviewers to scan long threads and identify which e-mails are pivots, where new threads begin, and when attachments or people are added or dropped from the conversation. Threading streamlines a time-consuming activity, allowing legal teams to understand e-mail communications must faster.

Auto Redaction

During the discovery process, there are many reasons why you might need to redact portions of a document. However, simply blacking out the text won’t fulfil today’s needs. Courts or lawyers may want to see reasons for redaction, and some sections of text may be redacted for multiple reasons.

The find and redact feature in Nuix Ringtail addresses these needs. When you search for keywords – or more complex patterns such as credit card numbers or personally identifiable information – the software generates a list of hits, or results to that search, so you can apply redactions accordingly. Once you have searched a term, you will see the context text around it. From here, you can select individual excerpts you want to redact, then label them with a colour. These colours will allow you to understand why something was redacted. If there are two reasons why an excerpt should be redacted, you can select it multiple times.

Start Turning Your Ship Around

Given the breadth and depth of the data inside most organisations, a true end-end platform offers the greatest opportunity to satisfy the most important part of the discovery process: finding relevant information quickly in support of a legal matter.

The list of requirements is daunting. You not only need the brute power to process and enrich terabytes of data quickly, but also the dexterity to analyse and manipulate the data to discover the true facts. Only this combination will help you steer clear of today’s information icebergs.

Fortunately, the technology is now available to transform your eDiscovery process from a loosely connected set of reactive process to an integrated workflow – from information governance to presentation. That means turning the Titanic isn’t as hard as it might have been, once upon a time.


Head of eDiscovery
IG & Engine Solutions

Shane Jansz has more than 15 years’ experience working in eDiscovery and information management of unstructured data. His technical and implementation services background has helped him bridge the gap between organisations’ business, IT and legal divisions to improve their business processes for governance, electronic discovery and forensic purposes. He now heads an ever-growing team of sales and presales professionals throughout the Asia Pacific region and since joining Nuix in 2009 he has helped many large corporate and government organisations address their needs in eDiscovery, investigation, information management and IT security risk and compliance.