Skip to content
BubbleBrain

AMO-Bench from Meituan

· 1 min · Thought / Meituan / Paper / Chinese AI / Benchmark

I found a new benchmark paper from Meituan:AMO-Bench: Large Language Models StillStruggle in High School Math Competitions.

This paper introduces AMO-Bench, a new advanced mathematical reasoning benchmark with 50 original Olympiad-level problems designed to test LLMs. It targets the growing issue that existing math benchmarks(AIME 24, AIME 25) have become too easy for top-tier models, leading to performance saturation.

Key Features of AMO-Bench#

  1. Completely Original Problems
  1. Olympiad-Level Difficulty
  1. Final-Answer Evaluation
  1. Human-Annotated Reasoning Paths

Data Construction Process#

The benchmark’s design pipeline includes:

Benchmark Pipeline

Experimental Findings#

Across 26 LLMs, result reveal:

Result

Analysis#