I saw the paper LLMs can get “Brain Rot” is very popular on my X timeline and there are many discussions about it.
I just read this interesting paper and here are my notes about this paper:
Two Core Ideas
- The paper proposes the LLM Brain Rot Hypothesis: continual pretraining on junk internet data causes lasting cognitive decline in LLMs.
- This decline mirrors human brain rot caused by excessive consumption of trivial online content.
Experiments
- Controlled experiments were conducted using real X data, divided into two following categories:
- M1: Engagement Degree: short, highly popular posts(high likes/retweets/replies)
- M2: Semantic Quality: content classified as low-quality
- Four LLMs were continually pretrained on these junk & controlled datasets:
- Llama3-8B, Qwen2.5-7B, Qwen2.5-0.5B, Qwen3-4B
- Benchmarks
- Reasoning (ARC)
- Long-context understanding(RULER)
- Ethical alignment(HH-RLHF, AdvBench)
- Personality traits(TRAIT benchmark)
Key Findings
- Cognitive Decline:
- Models trained on junk data showed consistent drops in reasoning accuracy and long-context retrieval
- Decline strength increased with the dose of junk data
- Failure Mechanism:
- Models exposed to junk data increasingly skipped reasoning steps, producing shallow and truncated answers.
- Over 80% of reasoning failures stemmed from missing intermediate thought steps
- Model Behavior Shifts:
- Under M1, models developed “dark traits” such as psychopathy, narcissism and reduced agreeableness.
- M2 (semantic quality junk) caused milder declines but still degraded reasoning and safety.
Implications
- Data quality is a causal driver of model capability decay.
- The problem is not only about misinformation of toxicity. Even non-malicious but shallow, virual content can rot LLM cognition.
评论