The Tech Lens

The Tech Lens

AI Lessons

How a Transformer Works

Lesson #1 - AI made simple

Rolando's avatar
Rolando
Mar 08, 2026
∙ Paid

Let’s start with the basics: understanding how an LLM “thinks” requires understanding the Transformer, the architecture introduced in 2017 by Google’s “Attention Is All You Need” paper. Everything you use today—GPT, Claude, Gemini, Llama—is built on this foundation.

1. Tokens: the atomic unit of text

Before talking about attention, you need to understand how the model “sees” text. Raw text is broken into tokens—which don’t necessarily correspond to whole words. The word “tokenization” might become [”token”, “ization”], while “AI” is a single token.

Each token is converted into an embedding: a high-dimensional numeric vector (e.g., 768 or 4096 floats) that represents its meaning in semantic space. This is where mathematics takes over linguistics.

2. The Attention Mechanism

The self-attention mechanism is the heart of the Transformer. The central idea is simple yet powerful: each token must ask itself, “What other tokens should I pay attention to to understand my meaning?”

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Rolando · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture