Discovery: How long does it take to read 17.6 million documents?

Technology is affecting all facets of our life.  Litigation is no exception.

Litigation can be a very time and labour intensive exercise, particularly at the stage of discovery (or disclosure) of documents. Discovery is where parties in litigation provide to each other copies of all documents which are relevant to the litigation.

In large commercial disputes, documents the subject of discovery can be voluminous, and significant time and cost can be spent in properly undertaking discovery. Discovery usually involves considering what the relevant issues are, locating documents which may be relevant and then reviewing those documents for relevance.

It is becoming more and more common for a large proportion of documents the subject of discovery to be in electronic form. Those documents can include emails, word documents, voice recordings, spreadsheets and many different electronic file types known as electronically stored information (ESI).

The review of ESI, in particular to identify relevance, is usually then a significant exercise. It will also involve locating and identifying duplicates of documents (e.g. emails which have been copied to a number of people). In a recent case in England, backup tapes held by one of the parties contained more than 17.6 million electronic documents.

Given the huge amount of ESI which can potentially be relevant, parties and lawyers have been using technology to assist with its review.  Most recently, parties have started to use predictive coding in order to assist in undertaking the review of ESI. Predictive coding is also known as technology assisted review, computer assisted review or assisted review. In short, this means that the review of ESI is undertaken by computer software, rather than humans. To start the predictive coding process, a lawyer reviews a sample set of documents and the results of that information are provided to the software. The software then “learns” from that information, and reviews all of the ESI. The results of the software review are then subject to some further human review, until an agreed tolerance is reached. The software continues to “learn” from the further human review. Ultimately, the list of relevant documents is produced as a result of the software review, meaning that a human does not need to review every single piece of ESI.   

Courts in the USA, Ireland and England have already accepted predictive coding as a means of reviewing ESI for discovery in appropriate matters1. In those cases, the courts have had regard to statistics which show that the use of predictive coding is at least as reliable as human review, and potentially more reliable.  If that is the case, then it means that using predictive coding will allow for a more expeditious and economical discovery process. 

Given that the increase of ESI is no different in Australia than in the USA, Ireland and England, it is likely that, in appropriate cases, the courts in Australia will adopt similar approaches.  Ultimately, that is likely to reduce the time and cost involved in undertaking disclosure, while at the same time providing results which are more reliable and quicker than a human reviewing huge amounts of ESI.

For more information or discussion, please contact HopgoodGanim Lawyers' Dispute Resolution team. 


1. See, for example: USA – Da Silva Moore v Publicis Groupe 11 Civ 1279 (ALC)(AJP): Ireland - Irish Bank Resolution Corporation Ltd v Quinn [2015] IEHC 175; England - Pyrrho Investments Ltd v MSW Property Ltd & Ors [2016] EWHC 256 (Ch), Brown v BCA Trading Limited & Ors [2016] EWHC 1464 (Ch).