Here is a small thing that should not work, and does.

You type “my laptop keeps freezing” into a search box. Back comes a help article titled “system hangs on boot.” Read those two phrases again. They share no words. Not “laptop,” not “freezing,” nothing. An old-fashioned search, the kind that matches letters, would have shrugged and returned nothing. Yet the modern one found the exact right answer, because somewhere underneath, the machine understood that a freezing laptop and a hanging system are the same idea wearing different clothes.

That quiet understanding is the single most useful trick in modern AI, and it has a name: embeddings. Almost everything you find impressive, semantic search, RAG, recommendations, an assistant that remembers what you meant three messages ago, is standing on this one idea. So I want to build it up with you slowly, from the first honest intuition all the way to the parts that still make senior engineers smile. No prior math needed to start. By the end you will understand something real.

The core move: turn meaning into a place

Computers cannot compare meanings. They can only compare numbers. So the whole game is a single audacious move: take a piece of text and turn its meaning into a list of numbers, chosen so that similar meanings get similar numbers.

That list of numbers is called a vector, or an embedding. Think of the numbers as coordinates. Two coordinates, and you have a point on a map. Three, and it is a point in a room. An embedding just uses many coordinates, hundreds or thousands, to place a piece of meaning somewhere in a vast space. You cannot picture that space, and that is fine, nobody can. The point is what it buys you: once meaning is a location, “these two things are similar” becomes “these two points are close together.” A fuzzy human question turns into plain geometry.

"king"
0.21-0.440.680.05-0.310.52… 762 more
The word "king" as an embedding: just a long list of numbers. Each one is a coordinate along some axis of meaning. No single number means anything on its own; together they pin "king" to one exact spot in a space of meaning.

Seeing the space (a flat version of a huge idea)

Real embedding space has hundreds of dimensions, which is unpicturable. But the behaviour survives if we squash it down to two, so let me show you a flat cartoon of the real thing. Watch where the words land:

king
queen
prince
banana
apple
mango
A 2D cartoon of meaning-space. Royalty clusters in one corner, fruit in another. Nobody told the model "these are royals." It learned, from oceans of text, that these words show up in similar company, and so it placed them near each other. Distance became a proxy for meaning.

The famous demonstration of how structured this space is: take the vector for “king,” subtract “man,” add “woman,” and the point you land on sits almost exactly on “queen.” The space encodes not just topics but relationships, direction itself carries meaning (“the male-to-female direction,” “the singular-to-plural direction”). That is not a party trick someone hard-coded. It fell out of learning, and the first time you see it work you feel a small jolt. I still do.

Measuring “close”: the angle, not the distance

So similar meanings land nearby. How do we measure nearby? Your instinct says “distance between the points,” and that is reasonable, but the tool everyone actually reaches for measures the angle between the two vectors instead. It is called cosine similarity, and it is worth understanding why angle beats distance.

Picture each embedding as an arrow from the origin out to its point. Two arrows pointing the same direction mean the same thing, regardless of how long the arrows are. Cosine similarity reads that angle and hands you one number:

≈ 1.0same direction
"king" · "queen"
≈ 0.5loosely related
"king" · "crown"
≈ 0.0unrelated
"king" · "banana"
Cosine similarity is just the cosine of the angle between two arrows. Zero angle gives 1 (identical meaning). A right angle gives 0 (unrelated). Opposite directions give -1. One clean number, from -1 to 1, for "how alike are these two meanings."

The mechanics, in plain words: multiply the two vectors dimension by dimension and add it all up (that is the dot product), then divide by each arrow’s length. Dividing by the lengths is the important part, because it throws the lengths away and leaves only direction.

Why deliberately ignore length? Two reasons, one practical and one deep. The practical one: a long document about laptops should not rank as “more about laptops” than a short sentence about laptops just because it has more words. Meaning is about which way you point, not how far. The deep one is a genuine senior-level fact worth carrying: in very high-dimensional spaces, plain straight-line distances between points all start to look eerily similar, everything drifts toward equally-far-apart. It is called the curse of dimensionality, and it quietly wrecks distance-based comparison. Angle survives it. That is the real reason cosine is the default, not habit.

Where do the numbers come from? (the honest mechanics)

Fair question: who decides that “king” is [0.21, -0.44, ...]? Nobody hand-writes these. A neural network learns them. Here is the pipeline without the hand-waving, and it holds from a fresher’s mental model to what actually ships.

1
Tokenize. The text is chopped into tokens (word-ish pieces). Each token starts as a lookup in a big learned table of vectors.
2
Read in context. A transformer passes those token vectors through its layers, letting every token adjust based on the words around it. "Bank" near "river" ends up different from "bank" near "money."
3
Pool into one vector. A sentence is many token vectors; we need one. Usually you average them (mean pooling), or use a special summary token. One fixed-size vector now stands for the whole text.
4
That vector is the embedding. Typically 768 to 4096 numbers, every one carrying signal. Ready to compare against any other.
Raw text goes in, one meaning-vector comes out. The transformer's job in step 2 is the clever bit: it reads each word in the light of its neighbours, so the final vector reflects meaning-in-context, not just a dictionary lookup.

Two words there deserve a beat, because they separate “I sort of get it” from “I actually get it.”

Dense, not sparse. An older approach gave each word in the dictionary its very own slot, so a vector was tens of thousands of numbers, almost all zero, one slot lit up per word present. That is sparse, and it is why old search thought “laptop freezing” and “system hanging” were total strangers: different slots, zero overlap. Modern embeddings are dense: a few hundred to a few thousand numbers where nearly every value means something, and meaning is spread across all of them. That density is exactly what lets unrelated words land on nearby meanings.

Trained by contrast. How does the model learn to place similar things together? You show it pairs that should be close (a question and its correct answer) and let everything else in the batch count as things that should be far. The model nudges the close pairs together and shoves the rest apart, over and over, across billions of examples. This is contrastive learning, and it is why “laptop freezing” and “system hangs” drift into the same neighbourhood despite sharing no letters. They kept appearing in similar roles, so the model learned to point them the same way.

Now the payoff, and it is beautifully simple once the pieces are in place. Searching by meaning instead of by matching letters is four steps:

1
Embed everything you own once, up front. Every doc becomes a vector, stored in a vector database.
2
Embed the query with the exact same model when someone searches.
3
Find nearest neighbours by cosine similarity, the stored vectors pointing most nearly the same way as the query.
4
Return those (and optionally rerank the top few more carefully).
This is the engine under semantic search and under the "retrieval" in RAG. Notice step 1 happens once and offline; only the tiny query embedding and the nearest-neighbour lookup happen live. That is why it feels instant.

And this is exactly why our opening magic trick worked. “my laptop keeps freezing” and “system hangs on boot” get embedded into arrows pointing almost the same direction. High cosine similarity. The letters never mattered; the meanings pointed the same way.

Here is that same idea as scores, so you can feel the gradient between a strong match and a weak one:

"system hangs on boot"0.83
"pc won't respond, screen frozen"0.79
"how to speed up a slow computer"0.51
"best banana bread recipe"0.06
Cosine similarity of each candidate against the query "my laptop keeps freezing." The two real matches score high with zero shared keywords; the banana bread, sharing the word count but none of the meaning, sinks to the floor. The number, not the words, does the ranking.

Where you’ll actually meet embeddings

This is not a lab curiosity. It is quietly running under most useful AI you touch:

Use caseWhat gets embeddedWhy it works
Semantic searchYour documents and the queryFinds meaning, not keywords, so synonyms and paraphrases still match
RAGChunks of your knowledge baseRetrieves the passages nearest the question to feed the model real context
RecommendationsItems, and a user's history"More like this" becomes "nearest neighbours in item-space"
Deduplication / clusteringRecords, tickets, reviewsNear-identical meanings cluster together even when worded differently
Agent memoryPast notes and decisionsRecall what's *relevant* to now, by nearness, instead of scrolling everything
ClassificationThe input textSimilar inputs sit near labelled examples, so the nearest ones vote
Every row is the same primitive: turn things into vectors, then reason with nearness. Once you see it, you spot embeddings everywhere.

I have leaned on this myself. A tool I built keeps a working memory across sessions, and the way it decides what past context is relevant to your current task is exactly this: embed the notes, embed the moment, pull the nearest. It is a small idea doing enormous work.

One more, for the senior in the room: nested embeddings

Here is a recent, genuinely elegant twist that rewards understanding the basics. A 3072-dimension embedding is powerful but heavy, more storage, slower to compare, at scale that is real money. You would love to use a shorter vector when you can afford lower precision, and a longer one when you need the best. Normally that means training a whole separate model per size. Annoying.

The fix is a training trick called Matryoshka representation learning, named for the Russian nesting dolls, and the name is the whole idea. The model is trained so that the most important meaning is packed into the earliest dimensions, and every prefix of the vector is a complete, usable embedding on its own.

full vector · first 3072 dimsbest quality
first 768 dimsgreat, 4× smaller
first 256 dimsgood, 12× smaller
first 64 dimscoarse but usable
One vector, many honest sizes. Because training forced the front of the vector to carry the heaviest meaning, you can just chop the tail off to shrink it, no new model, no re-embedding. Doll inside a doll inside a doll, each one complete.

Why it matters in practice: you can do a fast, cheap first pass with the short prefix to narrow millions of candidates down to a few hundred, then rerank just those with the full vector for precision. Best of both. The remarkable part is that a well-trained short prefix often matches a much longer vector from an ordinary model, real quality at a fraction of the cost. That is the kind of design that makes you sit back a little.

The one sentence to keep

Strip everything away and embeddings are this: meaning becomes a place, and similarity becomes distance. Turn text into a point in a space built so that alike things sit close, and suddenly a computer that only knows how to compare numbers can answer questions about meaning, find the right doc with none of the right words, remember what’s relevant, group what belongs together.

The next time a search understands you better than the words you typed, or an assistant surfaces exactly the note you needed, you will know what happened underneath. Your meaning was turned into an arrow, and somewhere in a space too big to picture, it pointed at the answer.

← Back to blog