Enterprise Search's Ranking problem - RAGing with Large Language Models (LLM) won't solve it.

Rajasankar Viswanathan
Dec 11, 2025
6 min read

Since the time, Google's magic of finding relevant content in public data, the race for solving the ranking problem in private data aka Enterprise data is still going on.

There were various attempts to solve it. This article lists those failed attempts and how the latest RAG-e of trying the luck with LLMs won't solve the problem.

Search or Information Retrieval in the structured data is easy and enabled by softwares known as databases. There are various types of databases and getting data in the structured format is called querying using a query language. There are various methods collectively known as database normalization to make the databases more easy to query and efficient to store the data.

Search or Information retrieval in unstructured data is the messy and convoluted problem. Unstructured text data needs to be formatted, converted into structure for enabling the search. That is the easy part. Hard part is how to find and get the information relevant to what the user is looking for.

In structured data, sorting is the relevancy. Sort by either numerical, alphabetical and combination of these is available in the query language itself. In unstructured data, what is revelenacy itself is hard to define. Is it the meaning, does it contain all the text of the query, is it contextual or is it from some source?

In public data or public search, this relevancy is solved by metadata or details about the source. A well known national newspaper would rank higher than a local newspaper thus the information in the national newspaper will be shown first then the local newspaper. This can be applied to websites where most popular ones are shown first thus solving the relevancy or ranking. Other details about the websites, such as date of registration etc are used to enhance the ranking metric. The actual content is not used for this ranking, just the popularity of the source itself equated with relevancy of the content.

In the era of AI labeling, it can be argued that creators of the websites are the first labellers for the big tech companies and they did this voluntarily in return for traffic from the search engines. Entire ecosystem of Search Engine Optimization created hundreds of thousands of jobs working on content, placing keywords and so on. Search engines benefited from selling ads both on their sites and in the websites.

All in the name of finding content that people are looking for.

What made public data search so effective and almost magical made private data search aka Enterprise search impossible because there is metadata for the document nor people who can create metadata. Only content is available and ranking must be solved using content i.e. data inside the documents alone. For the companies, this posed one of the biggest problems in handling information. Ineffective methods such as tagging or content management systems did little to solve the problem.

Enter Natural Language Processing methods. Instead of dealing with this as purely on search methods, academica tried to solve this problem by creating a structure out of unstructured text data using linguistic properties of languages. It would be getting the names i.e. nouns and verbs extracted, creating a count of words, and ways to split the long form text into small chunks called tokenization. Interestingly current genAI or LLMs still use the same old methods of tokenization which were created 50-60 years ago.

Vectorization is the name for creating a structure or converting the unstructured data into numerical representation. Vectorization refers to several methods each has its own advantages or disadvantages, however, the basic premise is the same. Create a vector space to deal with text data. Once the vector space is created, statistical or probabilistic methods can be used to analyze the data. From simple methods to count the number of words to complex similarity methods, vector space models provide some calculation of similarity.

Let us start with a famous and simple one of tf-idf Term Frequency - Inverse Document Frequency, this simple but working method defines the words in terms of frequency. How many times a word occurs in a document relative to how many times the same word occurs in the entire data. Works for when you have the same kind of documents and small dataset. When you have large data and diverse one, this fails. This was improved to add distance methods such as cosine, L2 etc to calculate the ranking of the documents when search happens. Remember these methods rank the document relative to the search query not before.

So every time a query is used, the ranks may alter depending upon the words used. Knowing those keywords is the success metric rather than the search engine providing the results. For this, several tricks or workarounds are added, by changing the words itself. For example, all words are transformed into simple tense, such as word "walk" is added whenever "walked" and "walking" words appear. So that if the people searched for the word "walking" then "walk" and "walked" words will be searched too. Another trick would be using a dictionary to search, for example head and thermal are relative words so if one word is used another word is also added to the search automatically. Another few methods are used but skipped for this discussion.

All these still won't solve the problem. Search ranking in enterprise data is still bad and people have to spend long hours to get the information they are looking for.

Welcome to the world of LLMs and genAI. LLMs are started as chatbot replacement for search. That part of the origin story is not the discussion for this article. Companies saw that as an option to enhance their Enterprise search. Instead of doing all the dance of tricks, just add the data to the model or finetune the model to fire up the chatbot. However, it is not as simple as that in the practical world. LLMs disadvantages and adding the data on both sides restricted the use of LLMs.

LLMs has one property that makes it impossible to be used in Enterprises is hallucinations. Hallucinations is a nice name for the behaviour of LLMs where it produces fake or non-existent facts. To solve this problem, RAG was introduced. In simple terms, verify the LLMs output with database query to ensure that information is not misrepresented.

So why do LLMs produce fake information? That behaviour results from the basic working principle of the LLMs. LLMs actually find the best probability of sequence of words for the given query or prompt (the fancy name). In simple words, given a query/prompt, the LLMs searches in the model, then arranges the words based on the possible best probability. This arrangement has no connection to reality or actual data.

LLMs search in model, not in the actual or indexes. This needs to be kept in mind. The LLM Model is created as weights corresponding to the words/tokens/vectors. Model itself is created on statistical methods which uses reward functions, another name for relevancy calculation, to create the tokens corresponding to the weight. With this Enterprise search back to square one, of the limits of relevancy from statistical probability.

Idea of finding the relevancy of words/tokens in large data, applying that private data to solve the relevancy problem in private data is fanciful from the beginning. Basic reasoning is that there is no idea of concepts, jargons, or contextual understanding. Popularity on the public web is different from what Enterprise workers wanted to see when they search for a piece of information.

Out-of-Distribution (OOD) is another issue if the company has wholly different data which is not in the public domain or much different from common style of language. LLMs can't find patterns in that data which will lead to hallucinations and not relevant results.

Retrieval Augmented Generation (RAG) is an attempt to solve the Enterprise Search problem not a solution. If you compare preparation of data for Solr/ElasticSearch and RAG both would be similar. The words used for Solr/ES are replaced by similar meaning words for RAG. Some words are retained. However the preparation of data and workings are mostly similar. Except instead of a Software, the data fed to the LLMs. Instead of a list of documents, a summary of those documents or query is produced.

This RAGing with LLMs won't solve the Enterprise Search problem. Companies now today sit on mountains of data without knowing what to do, how to analyze and how to extract value from it. This issue affects the entire economy as effective answering would ushering more solutions to problems faced by the public.

How to solve it? Let us discuss that in the next part.

Comments