Know what your firm knows— instantly

Discover what your firm can achieve when every lawyer has instant access to your full institutional knowledge.

All Search Engines Are Not Created Equal

Paulina Grnarova
CEO & Co-Founder at DeepJudge

‍This post is the third in a series about how to implement legal AI that knows your law firm. In the series, we cover the differences between LLMs and search, the elements that make a good search engine, the building blocks of agentic systems (e.g. RAG), and how to implement a system that is fully scalable, secure, and respects your firm’s unique policies and practices.

Why keyword and semantic-based systems don’t meet the needs of legal professionals

Since LLMs are not search engines, they must rely on techniques like RAG to accurately retrieve information (see the first post in this series). The most important criteria for successfully integrating a firm’s proprietary content into the RAG process (discussed in the second article in this series) is the search engine’s capacity to identify the relevant data for a given question. Law firm data can be messy and fragmented, taking many forms and being dispersed across various platforms. Much of it can be found in a central Document Management System (DMS), but other data may be found in SharePoint, HighQ, file drives, HR systems, client and matter systems, individuals' email inboxes, or any number of other disparate systems. It consists of a mix of highly structured and unstructured data, including emails, memos, contracts, court filings, policies, and other text-based content. 

Integrating disparate internal document sources is essential for firms to leverage their internal knowledge in an AI platform because good documents might exist in any number of places. Constructing a search system capable of accommodating tens or even hundreds of millions of documents—all while adhering to access permissions and handling diverse document types from multiple sources—represents a substantial engineering endeavor. 

Search systems primarily use two search methods: keyword-based search, also known as frequency search, and semantic search, also known as vector-based search. While keyword search retrieves documents containing exact query keywords (such as when trying to match the exact name of a company), it often yields irrelevant results or overlooks relevant documents that don’t contain those precise keywords. By contrast, semantic search operates on concept-based matching, embedding text into numerical “vectors” that enable similar concepts, no matter how they are worded, to be easily identified as related or comparable to each other. 

Legal AI requires information retrieval designed for legal language, nuance, and structure. In fact, this means neither keyword-based nor vector-based engines alone can fully meet the needs of legal organizations. A capable legal search engine is one that combines keyword with semantic search. Why? Because each has unique benefits depending on what is being searched.

Consider the following table:

Things Keyword Search Excels At Finding:
Things Semantic (Vector) Search Excels At Finding:
Names Concepts
Numbers Semantics
Exact Phrases Paraphrases
Titles Related Content

A good legal search engine needs to be able to handle all of these things. Let’s say you’re looking for a Notices clause for a contract you’re drafting. One day you might be looking for just an example of one such clause where email is a permitted form of notice; in this case, a conceptual or semantic search will likely find a good one. The next day, however, you might be looking for a specific Notices clause in which Jonathan Apple was the lawyer to be notified. In this case, a keyword search method will likely work better for finding the name but you would still rely on semantic search to find clauses that don’t contain the word “notices.” As you can see, lawyers often need to combine the two in one search.

The goal, therefore, is not to choose one or the other, but rather to choose a system that intelligently blends the best of both keyword and semantic (vector) capabilities. This intelligent system delivers contextually-informed search results while respecting the importance of specific keywords. It empowers attorneys to query internal knowledge efficiently and effectively, and most importantly, the resulting high-quality search output becomes essential input for LLMs to learn from.

There are many more elements to selecting a quality search engine for legal AI. We discuss more of these in the next post, coming soon.

Explore the blog series “Legal AI That Knows Your Firm”

Posts in this series:

  1. The Allure (and Danger) of Using Standalone LLMs for Search
  2. Why Retrieval Augmented Generation (RAG) Matters
  3. All Search Engines Are Not Created Equal (this post)


This post was adapted from our forthcoming 24-page white paper entitled "Implementing AI That Knows Your Firm: A Practical Guide." Sign up for our email list to be notified when the guide is available for download.

Subscribe to our email list