Skip to content
BubbleBrain

Note about Puzzled By Puzzles: When VLM Can’t Take a Hint

· 1 min · Thought / Paper

I read the paper Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint from UC Berkely.

The authors built a hand-crafted benchmark of 432 English rebus puzzles, each annotated with 11 cognitive-skill categories and they also tested a wide range of models from open-source VLMs to reasoning-enabled models.

Performance was measured in two ways:

Below are some main findings: