Hello! Hope you’re doing well.

This blog post will be a bit unusual. I'm here to burst some myths surrounding Llama 4. Don't worry if you're unfamiliar with the context or these models—I'll explain clearly as we go along.

Context

https://x.com/AIatMeta/status/1908598456144531660

Let’s just go straight to the claims made and breakdown it with some logic, analogies and maths (if required).


CLAIM 1 : The 10M Context!

<aside> <img src="/icons/send-to_yellow.svg" alt="/icons/send-to_yellow.svg" width="40px" />

The 10M (10 million) token context isn’t real, no model was trained on prompts longer than 256k tokens. Beyond 256k, output quality drops.

</aside>

Breakdown:

What is context window?

In LLMs, the context window is the amount of text (measured in tokens—think words or word pieces) the model can "see" at once to generate a response.

A 10M token context sounds impressive. BUT!

The model wasn’t trained on sequences longer than 256k tokens. Training is where the model learns patterns from data. If it’s never seen prompts longer than 256k, it’s like asking a chef to cook a 10-course meal when they’ve only practiced single dishes. Beyond 256k, it’s guessing, not reasoning.

Let’s dive in some maths of this.

<aside> <img src="/icons/send-to_yellow.svg" alt="/icons/send-to_yellow.svg" width="40px" />

Let’s make it more clear with an analogy.

Imagine a painter who’s only worked on 8x10 canvases being asked to paint a mural the size of a football field. He might stretch skills, but the details get blurry. That’s Llama 4 beyond 256k tokens.