How LLMs Actually Work
The first time I really understood how a language model works, I felt a little cheated.
Not because it was complicated. Because it was simpler than I expected. I had built a tool that switches between a dozen of these models. I had shipped AI features used by real people. And the actual core idea, the thing underneath all of it, fits in one sentence.
So let me give you that sentence up front, and then spend the rest of this post earning it.
A large language model is a machine that, given some text, guesses the next word. That is all it does. Everything else is a consequence of doing that one thing extremely well.
That is it. It does not think. It does not look anything up. It does not understand you the way your friend does. It guesses the next word, then the next, then the next, faster than you can read, and somehow that turns into poetry, code, and answers to questions you did not know how to ask.
Sounds too small to be true. Let me show you why it works.
Start with something you already trust
You have used this technology for years. Your phone keyboard.
You type “I’ll be there in five” and it offers “minutes.” You type “happy” and it offers “birthday.” It is reading what you wrote and betting on what comes next.
Your keyboard is bad at it. It only looks at the last word or two, so it has no idea what you are actually talking about. Ask it anything real and it falls apart.
A large language model is the same idea, taken to an absurd extreme. Instead of the last word or two, it looks at everything you wrote, plus everything before that, sometimes a whole book’s worth of context. And instead of learning from your texts alone, it learned from a huge slice of everything humans have written down.
Same instinct. Wildly different scale. That gap is the whole story.
The only trick: predict the next piece, then do it again
Here is the loop, and it really is this plain.
You give the model some words. It produces one piece of a word. It sticks that piece onto the end of your words. Then it feeds the whole thing back into itself and produces the next piece. Round and round, until it decides it is done.
That word “guess” is doing a lot of work, so let me be honest about it. The model does not pick one word. For every possible next word, it produces a number: how likely this word is, right here, right now. Thousands of words, each with a score. Then it usually picks from the top of that list.
This is why the same question can give you slightly different answers. There is a dial, often called temperature, that decides how adventurous the picking is. It is a real setting you can change when you call these models through their API, and it has a real number attached to it.
Here is the cleanest way I have found to picture it. Say you ask the model to finish “My favourite drink in the morning is.” Behind the scenes it has scored every possible next word. “Coffee” sits near the top. “Tea” is right behind it. “Mango” is way down the list but not impossible. Temperature decides how far down that list the model is willing to wander.
Reading about it only gets you so far. Drag the slider below and watch what happens to the same question as you turn the heat up. Try the extremes. That feeling, of the answer going from reliable to playful to slightly off the rails, is the whole point.
So what are the actual numbers? Here is where it gets interesting, because they are not the same everywhere. This trips a lot of people up.
With OpenAI’s API, temperature runs from 0 to 2 and defaults to 1. With Anthropic’s Claude API, it runs from 0 to 1, also defaulting to 1. So setting temperature to 1 does not mean the same thing on both. On Claude that is the top of the range, the wildest it goes. On OpenAI that is only the halfway point. Same number, different meaning. This is exactly the kind of small, real headache you hit when you build tools that talk to more than one provider, and it is a big part of why I built llmswap in the first place.
And it keeps moving. OpenAI’s newer GPT-5 models removed the temperature knob altogether and just fix it internally. So the lesson is not “memorise the numbers.” The lesson is: this is a setting that varies by provider and even by model, so when an answer feels too random or too robotic, the first thing to check is which dial you are actually turning.
A rough field guide I actually use:
- Around 0 to 0.3: facts, code, data extraction, anything where you want the same answer twice. Boring on purpose.
- Around 0.7: a good everyday middle. Helpful, with a little life to it. This is close to many defaults for a reason.
- Around 1 and above: brainstorming, names, jokes, first drafts. You are asking for surprise, and you will get some duds along with the gems.
One honest catch, straight from Anthropic’s own docs: even at temperature 0, the output is not guaranteed to be identical every single time. Lower temperature makes it far more consistent, but “deterministic” is a promise these systems quietly do not make. Good to know before you build something that assumes otherwise.
But how does it know “Paris”?
Fair. Prediction is the loop. The real magic is why the prediction is any good. And the answer is the part that makes people uneasy, so I will say it plainly.
It read almost everything.
Books, websites, code, arguments on forums, recipes, instruction manuals, poems. An amount of text no human could finish in a thousand lifetimes. And during a long, expensive process called training, it played one game, billions of times: cover up the next word, guess it, check the real answer, adjust.
Imagine being handed every sentence ever written, with the last word hidden, and being asked to guess it. At first you would be hopeless. But do that enough times and you would start to notice patterns. “The capital of France is” tends to be followed by “Paris.” “Once upon a” tends to be followed by “time.” Code that opens a bracket tends to close it.
The model does not store these as facts in a list. It tunes billions of tiny internal dials so that the patterns are baked into how it reacts. When you later type “The capital of France is,” the right answer lights up not because it looked it up, but because that path is worn smooth from a billion passes.
This is also why a model can be confidently wrong. It is not reciting truth. It is producing the most plausible continuation. Most of the time plausible and true line up. Sometimes they do not, and you get a sentence that sounds perfect and means nothing. People call this hallucination. I think of it as the model doing exactly its job, predicting believable text, with no separate sense of whether the text is real. It was never taught the difference. It was taught what sounds right.
Sit with that for a second, because it changes how you should trust these things. The fluency is not evidence of correctness. It is just evidence that the model has read a lot of fluent writing.
Words are not really words
One more layer, and then you will understand more about this than most people who use it daily.
The model does not see letters or even whole words. Before anything happens, your text gets chopped into pieces called tokens. A token is usually a chunk of a word. “Running” might be “run” plus “ning.” A space is part of a token. Common words are single tokens. Rare ones get split into several.
Each token gets turned into a long list of numbers. That sounds cold, but here is the beautiful part. These numbers are arranged so that tokens with related meanings end up near each other. “King” sits close to “queen.” “Paris” sits close to “France.” The model builds a kind of map of meaning, where direction and distance carry sense, all on their own, learned only from reading.
So when it predicts the next token, it is doing arithmetic on points in this map of meaning. It is not magic. It is geometry with a very good map.
Putting the whole thing together
Let me stitch the pieces into one picture, start to finish, the moment you hit enter.
- 1Your sentence gets chopped into tokens.
- 2Each token becomes numbers on the map of meaning.
- 3The model weighs everything you wrote and scores every possible next token.
- 4It picks one, based on the scores and the temperature dial.
- 5That token gets added on, and the whole thing loops back to step 3.
- 6It stops when it predicts a quiet "I am done" signal.
That is a language model. A next-word guesser that read almost everything, thinks in chunks, navigates a map of meaning, and runs its guess in a loop until it has said its piece.
Why I find this comforting, not scary
When people hear “it just predicts the next word,” some feel let down, like the trick has been spoiled. I felt the opposite. Knowing the shape of the thing made me trust it correctly. Not too much, not too little.
It is a phenomenal pattern matcher. Lean on it for that. Drafting, explaining, translating, turning your messy thought into clean words, sketching code, talking you through an idea at 1am when no one else is awake. It is genuinely good at all of it.
But it has no memory of being right, no inner check for truth, no stake in the answer. So when it matters, you check its work. You stay the human in the loop. That is not a limitation to resent. That is just knowing your tool.
I have spent years building on top of these models. The wonder has not worn off. If anything, understanding the simple engine underneath made it feel more impressive, not less, that something so plain, repeated at such scale, can sound so much like us.
It guesses the next word. And in doing that, billions of times over, it learned to keep us company.