Notes about Gemini 3

Rarely has an AI model arrived with such unanimous anticipation across the industry. In many respects, Gemini 3 feels like Google’s “GPT-4 moment”. My feeds have been saturated with head-to-head evaluations, and the model’s front-end capabilities are nothing short of remarkable. Benchmarks depict a system operating at the outer edge of the current frontier. ...

2025年11月19日 · 1 min · 112 words · BubbleBrain

Best practices for prompt engineering from Anthropic

Anthropic recently published a blog post on Best practices for prompt engineering. After reading it, I believe it offers an excellent summary of the key practices for effective prompt engineering. The first principle is to be explicit and clear. Modern AI models respond exceptionally well to precise, unambiguous instructions. The key is to tell the model exactly what you want to see. ...

2025年11月17日 · 2 min · 348 words · BubbleBrain

Gemini 3 Canvas Test

I’ve been waiting for Gemini 3 for a long time, and this week I finally got to test it on the Gemini mobile app with Canvas enabled. While I’m still unsure which exact model it is, its performance is remarkably impressive. ...

2025年11月16日 · 1 min · 72 words · BubbleBrain

Skills explained: How Skills compares to prompts, Projects, MCP, and subagents

I read the blog from AnthropicAI and take some notes: This article explains the core components of Claude’s agentic architecture, designed for building sophisticated workflows. Prompts function as ephemeral, conversational instructions for immediate tasks. ...

2025年11月16日 · 1 min · 105 words · BubbleBrain

Note about Puzzled By Puzzles: When VLM Can’t Take a Hint

I read the paper Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint from UC Berkely. The authors built a hand-crafted benchmark of 432 English rebus puzzles, each annotated with 11 cognitive-skill categories and they also tested a wide range of models from open-source VLMs to reasoning-enabled models. ...

2025年11月4日 · 1 min · 122 words · BubbleBrain

Note about Qwen3-Max Thinking

Qwen3-Max Thinking was quietly released on Sunday. Earlier in the week, the team had promised it would arrive in the week. After putting it through a few coding tasks, I found its performance underwhelming. ...

2025年11月3日 · 1 min · 40 words · BubbleBrain

AMO-Bench from Meituan

I found a new benchmark paper from Meituan:AMO-Bench: Large Language Models StillStruggle in High School Math Competitions. This paper introduces AMO-Bench, a new advanced mathematical reasoning benchmark with 50 original Olympiad-level problems designed to test LLMs. It targets the growing issue that existing math benchmarks(AIME 24, AIME 25) have become too easy for top-tier models, leading to performance saturation. ...

2025年11月2日 · 2 min · 251 words · BubbleBrain

Emergent Introspective Awareness in LLMs

Anthropic just released a new post on emergent introspective awareness in LLMs. Here are my notes: The key experiment: the team injected concept vectors—anger, justice, etc. directly into the model’s hidden activations, then asked, “Do you feel anything unusual in your thoughts?” ...

2025年10月30日 · 1 min · 169 words · BubbleBrain

Notes about LLM Brain Rot

I saw the paper LLMs can get “Brain Rot” is very popular on my X timeline and there are many discussions about it. I just read this interesting paper and here are my notes about this paper: ...

2025年10月26日 · 2 min · 242 words · BubbleBrain

Kimi-Cli

Moonshot AI has open-sourced its own coding agent, kimi-cli. Built in Python, the codebase is approachable for anyone who wants to learn how agents are engineered. A single monthly subscription—bought on the official site—grants credits for both the web product and the CLI. ...

2025年10月24日 · 1 min · 64 words · BubbleBrain