March 2025
Two years ago, I sat down in front of my computer for an unusual interview—not with a person, but with an AI large language model — ChatGPT. Back then, I was curious. Could this AI really hold a conversation? Did it understand what I was asking? And most importantly, where was it getting its information? The AI answered my questions smoothly—almost too smoothly. While I was impressed, I walked away with a nagging sense of mystery. I didn’t understand how it worked—just that it did. More importantly, I largely dismissed it as a useful tool due to highly publicized examples of how it had gotten things wrong and its tendency to hallucinate. Frankly, I was skeptical. (If you’re interested, you can find the original interview here.)
The blog turned out to be popular, far more so than many others I have written over the years. Much has happened in the last two years—not just with ChatGPT, but with many other large language models –or “LLMs”. Without question, LLMs have improved significantly since my “interview” with ChatGPT . They are faster, more accurate, better at citing sources, and in some cases able to access the internet for current data.
LLMs have also have “gone mainstream,” becoming integral to our web searches on Meta, Google, and Bing. The same technology is now embedded in tools that offer “helpful” rewrites of our emails, texts, and documents, and it is a key component of most online and telephone help desk inquiries.
But how does an LLM actually work? Why is it so good at sounding human? And what’s really going on under the hood when it generates a response?
It’s time for an updated conversation—one that does a more complete job of explaining how these AI programs work, and what are the benefits and risks of incorporating them in our daily lives
Tokens, Parameters, and the Magic of Predicting “What Comes Next”
One of the most common misconceptions about LLMs is that they “think” like we do. They don’t. There’s no “ghost in the machine,” no digital consciousness weighing options and forming opinions. Instead, it’s more accurate to think of LLMs as sophisticated probability engines that “guess” the next word based on a mathematical model programed and modified using vast amounts of text data.
To unpack the magic of an LLM, it helps to understand two key terms: tokens and parameters. Of course, there are many more—a cacophony of technical jargon thrown around by engineers—but these two will suffice for our purposes.
Tokens
Think of a token as a chunk of text. It could be a single word (like “and,” “to,” “from”), or more likely a syllable or part of a word (like “un-“, “dis-“, “-ly”). When you type a question into an LLM, it doesn’t “see” the words in the sentence as you do. It breaks them down into tokens (words and bits of words). This is critical because, in the next step, the model must analyze token relationships using probability weightings, to generate a response.
The latest versions of LLMs can handle over 100,000 tokens in a single session. That’s impressive, especially considering that a well-educated adult has a vocabulary of about 40,000 words. This expanded token limit allows for even more complex and extended conversations, enabling models to process and generate larger pieces of text in a single interaction.
A Trillion Parameters: The “Rules” Behind the Response
However, the real magic happens when the LLM selects the next word (or series of tokens) in response to our question or request. This is done using a set of mathematical probability weightings assigned to various token combinations that comprised the question or request you typed or ask. These weightings are called parameters.
If tokens are the building blocks—the “dictionary”—then parameters are the “rulebook” that governs how an LLM constructs coherent responses. Modern LLMs have a staggering number of parameters. For example, some of the most advanced models are estimated to have nearly 1.8 trillion parameters! To put that in perspective: if you looked at each parameter for just one second, it would take over 57,000 years to examine them all!
“Training” the LLM
These parameters aren’t something an LLM invents on its own. They are shaped by human input— and analysis of word structures from massive datasets. These datasets incorporate a large percentage of human written knowledge and literature. Human feedback and carefully designed adjustments to parameters have been added to most LLM’s to keep responses accurate and safe. Over time, the responses generated by the LLM are refined, corrected, and guided by human oversight.
These parameters enable an LLM to generate understandable responses, that are almost always correct and accurate, because relative values assigned to the words used and their order, are based on vast amounts of analyzed data.
This is one function of the massive data centers that have emerged worldwide, where models are trained and updated on enormous datasets containing text, and in some cases, audio and video data. By examining an extensive portion of human knowledge, the LLM produces outputs that mimic human speech. Over time, as these outputs are tested, refined, and corrected and updated through reinforcement and human feedback, they become more contextually appropriate and accurate as well.
For example, the LLM determines whether the word “bank” refers to a financial institution or the side of a river based on surrounding words and user context. It can even gauge the tone of a sentence—whether a phrase is humorous, serious, or sarcastic.
Here’s an example based on a real exchange with ChatGPT:
Me: What can cows jump over?
ChatGPT: Cows are not known for their jumping abilities, but they can jump if necessary. In general:
- Cows can jump over low fences (typically up to 4-5 feet if motivated).
- Younger, more agile cows may jump higher than older or heavier ones.
- If startled or frightened, cows can clear obstacles they normally wouldn’t attempt.
- The phrase “the cow jumped over the moon” from the nursery rhyme is, of course, just a fun exaggeration.
Me: Rhyme: What do cows jump over.
ChatGPT: Cows can jump, but not too high,
A fence, a log, if they really try.
Over a ditch, across some hay,
But not the moon—no, not today!
In the first response, the LLM assumed I wanted a factual answer. In the second, my input “rhyme” triggered a different set of parameters, generating a poetic and more light-hearted response.
Limitations: When AI Gets It Wrong
Understanding how LLMs work also explains why they sometimes get things very wrong. Since they rely on predicting statistically probable next words, they can hallucinate information—confidently stating incorrect facts because they “sound right” based on training data. This is why AI-generated content should always be double-checked, especially for accuracy-sensitive topics.
A striking example of this occurred when lawyers submitted a legal brief that was generated by an LLM. The brief cited multiple legal cases that simply did not exist. The lawyers, assuming the AI’s output was correct, did not verify the cases before submitting them to the court. When the judge reviewed the citations and discovered that none of them were real, the attorneys faced serious consequences, including fines and reputational damage.
In addition, unless an LLM has real-time access to the internet and is updated regularly, its knowledge may be outdated. The old adage “garbage in, garbage out” applies. If an LLM is trained on flawed or biased data, its responses may reflect those inaccuracies. For example, if a model is trained on outdated medical research or biased historical accounts, its responses may reinforce those inaccuracies.
Trust But Verify
So how can you verify the accuracy and reliability of information provided by an LLM?
- Cross-check claims against reliable sources such as academic journals, news outlets, or expert databases.
- Ask the LLM for its sources. While it doesn’t always provide direct citations, some models can generate references or indicate the general sources of their information, offering a starting point for verification.
- Use models that cite sources. Some LLMs, like Perplexity AI, not only restate your query to ensure proper interpretation but also provide footnoted responses with references to specific sources, allowing for greater transparency and verification of information.
Take Time to Read and Consider AI Responses
One of the best habits to develop when using an LLM, is to slow down. Humans tend to trust authoritative responses, especially when delivered with certainty. But because they are a product of statistical probabilities rather than a conscious response, AI-generated answers should be carefully evaluated against your real-world knowledge and common sense. One method that can help in engaging critically, is to ask the LLM follow-up questions, request clarifications, and challenge responses. This will not only improve accuracy but it also ensures a deeper, more complete understanding of the topic.
A Powerful Tool
LLMs are powerful tools, but they remain just that—tools. They don’t replace human reasoning, intuition, or expertise, but they can significantly enhance them. By understanding how LLMs work, we can engage with them more effectively, apply them more appropriately, and develop a balanced perspective on their strengths, limitations, and best applications. Written with the assistance of ChatGPT and Perplexity AI