Skip to main content
All Posts By

Law Data Science

5 reasons to use machine learning technologies on your next document review

With data volumes rising and deadlines getting tighter, engaging with machine learning technologies remain one of the few defensible review mechanisms available to lawyers. Here are 5 reasons you should consider using them on your next review:

1. They reduce the burden placed on arbitrary keyword searching

There’s a saying. You don’t know what you don’t know – and this couldn’t be truer when applying keyword searches to large data sets.

The reality is that keywords are a very imperfect science. Think about searching your own inbox for instance – sometimes you know a message exists but you cannot remember the exact way you sent/received it. So you go through the process of iterating through variations of terms before you discover what you are looking for. Now, if this is the process for material that we are particularly familiar with, there is no evidence to suggest that lawyers magically enhance the crafting of these terms for the purpose of a document review.

Restricting keywords to achieve more favourable search numbers should be avoided at all costs. Machine learning techniques allow you to still apply keywords in a broad, non-restrictive sense – but offers you the power to navigate these results in a logical manner. It could very well uncover evidence you didn’t know existed.

2. You can leverage work you have already done to accelerate the review process

Managing document reviews where there are multiple tranches of documents is a difficult task. The unpredictability of time frames, data volumes and resource availability mean that where linear review is concerned, deadlines are usually missed. By utilising categorisation tools, such as active learning or continuous active learning, teams can feed additional material into models developed from earlier tranches of review. These models enshrine previously coded material and subject matter expertise, allowing the most ‘relevant’ material from the the new delivery to be served up to legal teams first.

3. The most important material is prioritised

As you start to build a machine learning powered review model, the concept of document prioritisation arises. That is, the algorithm powering your review takes into account the coding decisions made about documents, finds similar documents (conceptually and based on content) and pushes those documents to the top of the review queue. Where time is tight, this enables teams to hone in on critical documents first, allowing them to prepare relevant case material earlier.

4. Return on review investment is more readily achieved

Cost management on document review is a difficult task and it is often the case that costs spiral quickly. Because the relevance and review rate on a linear review is largely a static number, cost projections usually trigger adverse and adhoc behaviours in the name of cost savings. Nobody likes expensive document review – but people dislike non-defensible document review more.

On machine learning powered reviews, the review and relevance rate is not a linear concept. In actual fact, the relevance rate is higher towards the start of the review and the rate or review, slower. This is because most of the material being served towards the beginning of the review is in fact relevant. As such, the initial investment on a document review is realised at an earlier stage, meaning that it is easier to demonstrate to clients the value of undertaking the review.

5. There are robust mechanisms in place to validate results

Machine learning technologies enable you to cut-off a document review at a point where little or no relevant documents are being fed to reviewers. In these scenarios, it is important to understand the implications of stopping a review at a point in time. Most tools enable you to run tests to determine how many potentially relevant documents are being left on the table if a document review was to stop. For example, you might determine that it is time to stop a review with 10,000 unreviewed documents left. You sample 200 docs from the unreviewed pool and discover 2 relevant documents. Based of this extrapolation, it can be assumed that there are approximately 100 relevant documents left in the pool of  10,000 items. You can then determine if it is cost effective to continue reviewing.

The flexibility of these technologies mean that in many cases you do not need to deviate from day-to-day workflows to leverage their power. You can in fact review an entire document set that is machine learning powered as you would a linear document review – the decision about when to stop is yours!

Stay curious!

It is important to note that in all scenarios, the technology is only as good as the people applying it. Asking questions and seeking clarification about the process should be a part of every matter you run and instills confidence in the workflow being adopted.