Random wonders & thoughts, Tech, Science and AI


Confidently Wrong: Why LLMs Hallucinate (And Why It’s So Human)

Create an image that says lies by ai

Time to read:

3–5 minutes

Word count:

804 words

We’ve all been there. You’re in a meeting and your boss asks you a hyper-specific question you really should know, but don’t.

“What were our Q3 sales figures in the new Midwest territory?”

You have that horrible, sweaty-palm feeling. You can’t just say, “I don’t know.” So, you do the human thing: you confabulate.

“Well, I know Q2 was strong, and I heard the new rep is great… so I’d say we were… up… probably around 15%?”

You just generated a plausible-sounding, but completely unverified, answer. You didn’t retrieve a fact. You generated a guess that felt right.

This is the single best analogy for LLM hallucinations. And as it turns out, the “why” is just as human.

The “Auto-Complete” Core

At their heart, LLMs are just spectacularly good auto-complete systems. Their fundamental job isn’t to know things; it’s to predict the next most plausible word based on the patterns in their training data.

  • They don’t have a “fact database.”
  • They don’t have a “truth” register.
  • They just have a massive, statistical map of how humans use language.

When an LLM’s prediction of “what sounds plausible” (e.g., “The capital of France is Paris”) also happens to be true, we call it intelligence.

When its prediction (e.g., “The capital of France is Lyon”) is only plausible but not true, we call it a hallucination.

But this isn’t just a simple “whoops” moment. Recent research shows there are deep, technical reasons why this happens.

Problem 1: Hallucination is Mathematically Inevitable

This is the big one, as researchers like Xu et al. (2024) have pointed out.

Think of it this way: The LLM is trying to compress the entire internet—an unfathomable amount of data—into a (relatively) small set of parameters (its “brain”).

This is a lossy compression. You cannot do this without errors and information loss.

Just as a JPG file throws away data to get a smaller image (which can lead to weird “artifacts”), an LLM must throw away data to “fit” the web into its model.

A hallucination is just a statistical “artifact.” It’s a mathematical side effect of a system that has compressed reality. In this light, a certain amount of hallucination isn’t just likely; it’s inevitable.

Problem 2: We’ve Trained LLMs to be Bad Students

This is the counter-intuitive insight from the team at OpenAI (2024).

The problem isn’t just that the LLM guesses; it’s that we train it to guess.

Think of a student facing a multiple-choice test. If there’s no penalty for wrong answers, the best strategy is to never leave a question blank.

LLMs are trained in a similar way. During their pre-training, they are constantly forced to predict the next word. During their “RLHF” (fine-tuning) phase, they are rewarded for producing answers that look good and sound helpful to human raters.

The model learns a critical, dangerous lesson: A confident, plausible-sounding guess is almost always rewarded more than just saying “I don’t know.”

We’ve created a system that is incentivized to bluff. We’ve trained an “unbounded” guesser because it’s more “helpful” than a safe, “bounded” one that constantly admits what it doesn’t know.

The Craziest Part: The LLM Might Know It’s Lying

This is where it gets weird. You’d assume an LLM hallucinates because it just doesn’t know the fact. But research from Orgad et al. (2024) suggests that’s not always true.

They found that an LLM can have two different “representations” of a fact:

  1. The “Intrinsic” Knowledge: Deep inside its network, the model actually knows the correct fact (e.g., it knows the capital of France is Paris).
  2. The “Extrinsic” Output: This is the answer it generates.

The research showed that even when the model internally knew the right answer, it could still be made to output a hallucination (e.g., “The capital of France is Lyon”).

This is like our human in the meeting: he knows deep down that he doesn’t have the number, but the “output” part of his brain still generates “about 15%” to satisfy the boss.

The hallucination, in this case, isn’t a knowledge gap. It’s an output failure—a “lie” that the model produces to be a good auto-completer, as noted by Yao et al. (2023).

Got a favorite example of a wild LLM hallucination, or a cringy “meeting hallucination” of your own? Share your best story in the comments below!


References:
OpenAI (2024) Why Language Models Hallucinate. [Online] Available from: https://openai.com/index/why-language-models-hallucinate/ [Accessed: 17 November 2025].

Orgad, H., Toker, M., Gekhman, Z., Reichart, R., Szpektor, I., Kotek, H. and Belinkov, Y., 2024. Llms know more than they show: On the intrinsic representation of llm hallucinations. arXiv preprint arXiv:2410.02707.

Xu, Z., Jain, S. and Kankanhalli, M., 2024. Hallucination is inevitable: An innate limitation of large language models. arXiv preprint arXiv:2401.11817.

Yao, J.Y., Ning, K.P., Liu, Z.H., Ning, M.N., Liu, Y.Y. and Yuan, L., 2023. Llm lies: Hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469.

Drop in a comment!

Leave a Reply

Discover more from Iterative Wonders

Subscribe now to keep reading and get access to the full archive.

Continue reading