Trust But Verify — How LLMs Differ from Search Engines—and Why They Sometimes Hallucinate

| 0

Large Language Models (LLMs) like ChatGPT and traditional search engines such as Google may both help you find information, but they work in fundamentally different ways. Understanding these differences is key to using LLMs effectively—and to recognizing why LLMs sometimes “hallucinate,” or make up information.

How a Search Engine Works

A traditional search engine such as Google or Microsoft Edge essentially a giant, constantly updated database of web pages. When you enter a query, the search engine:

               •             Scans its index of the web for pages containing your keywords.

               •             Ranks those pages by relevance, popularity, and other factors.

               •             Shows you a list of links to real, existing web pages for you to explore.

Search engines don’t generate new information—they retrieve what already exists, helping you find the most relevant sources for your query. It’s possible, of course that the information retrieved may be inaccurate, or that your search request may have been incomplete and missed some important information, but a “search engine” is just that, an engine designed to seek out and retrieve information contained somewhere on the internet.

How an LLM Works

An LLM, by contrast, is trained on huge amounts of text data (books, articles, websites) to learn the patterns and relationships between words and ideas. When you ask it a question or request (this is called a “prompt”), an LLM:

               •             Analyzes your prompt to understand context and intent.

               •             Predicts and generates a sequence of words that best fits your request, based on its training data and learned patterns.

Initially LLMs did not look up answers live on the web (although many now use your prompt to generate an internet search and adds that information to its database). But even inn this case the response is generated by “guessing” the most likely next word in a sentence, drawing on what the model was exposed to during training.

Why LLMs Hallucinate

Because LLMs generate text based on statistical patterns—not by checking facts—they can sometimes produce information that sounds plausible but isn’t true. This is called “hallucination.” For example, if asked about a recent event that happened after its last training update, the LLM might invent details, unless it is able to access real-time data.

Hallucinations can also happen if:

               •             The model’s training data was incomplete or contained errors.

               •             The prompt is unclear or ambiguous.

               •             The model tries to fill in gaps with its best “guess,” even if there’s no factual basis.

                              LLMs have no concept of facts/knowledge and just learn statistical patterns of how likely it is one word follows another (given a context). … Hallucination is when a model ‘makes stuff up’.

LLMs with Web Search: A Step Forward (But Not Perfect)

Newer LLMs can now perform real-time web searches to supplement their responses. When you ask a question, the LLM:

               •             Runs a search to find relevant, up-to-date information.

               •             Incorporates this information into its answer, often providing citations or links.

This hybrid approach—sometimes called Retrieval-Augmented Generation (RAG)—helps the LLM provide more accurate and current answers, especially for recent events or facts outside its training data.

However, even with web search capabilities, LLMs can still hallucinate:

               •             They might misinterpret search results or combine information incorrectly.

               •             They can still generate plausible sounding but false statements if the search doesn’t return clear or accurate data.

               •             The process of merging search results with generated text can introduce new errors or fabrications.

Key Takeaways for Novice Users

               •             Search engines retrieve: They show you real, existing web pages that match your keywords.

               •             LLMs generate: They create new text based on patterns in their training data, not by looking up facts in real time.

               •             LLMs can hallucinate: Because they generate responses they sometimes make up information that sounds real but isn’t.

               •             Web search integration helps: LLMs that use live web search can provide more current, accurate information—but they’re still not perfect and can still hallucinate.

Understanding these differences will help you use both tools wisely—and always double-check important facts, especially when using LLMs for critical information.

Follow Marc McCarty:

Adjunct Professor, UMKC School of Law

Marc McCarty Adjunct Professor, UMKC School of Law University of Missouri System Broadband Initiative Steering Committee 816 304 9808 cell