Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

By Simon Willison's Weblog April 16, 2026 · Edited April 16, 2026

For anyone who has been taking my pelican riding a bicycle benchmark (https://simonwillison.net/tags/pelican-riding-a-bicycle/) seriously as a robust way to test models, here are pelicans from this

For anyone who has been taking my pelican riding a bicycle benchmark (https://simonwillison.net/tags/pelican-riding-a-bicycle/) seriously as a robust way to test models, here are pelicans from this morning’s two big model releases - Qwen3.6-35B-A3B from Alibaba (https://qwen.ai/blog?id=qwen3.6-35b-a3b) and Claude Opus 4.7 from Anthropic (https://www.anthropic.com/news/claude-opus-4-7).

Here’s the Qwen 3.6 pelican, generated using this 20.9GB Qwen3.6-35B-A3B-UD-Q4_K_S.gguf (https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF/blob/main/Qwen3.6-35B-A3B-UD-Q4_K_S.gguf) quantized model by Unsloth, running on my MacBook Pro M5 via LM Studio (https://lmstudio.ai/) (and the llm-lmstudio (https://github.com/agustif/llm-lmstudio) plugin) - transcript here (https://gist.github.com/simonw/4389d355d8e162bc6e4547da214f7dd2):

And here’s one I got from Anthropic’s brand new Claude Opus 4.7 (https://www.anthropic.com/news/claude-opus-4-7) (transcript (https://gist.github.com/simonw/afcb19addf3f38eb1996e1ebe749c118)):

I’m giving this one to Qwen 3.6. Opus managed to mess up the bicycle frame!

I tried Opus a second time passing thinking_level: max. It didn’t do much better (transcript (https://gist.github.com/simonw/7566e04a81accfb9affda83451c0f363)):

I don’t think Qwen are cheating A lot of people are convinced that the labs train for my stupid benchmark (https://simonwillison.net/2025/Nov/13/training-for-pelicans-riding-bicycles/). I don’t think they do, but honestly this result did give me a little glint of suspicion. So I’m burning one of my secret backup tests - here’s what I got from Qwen3.6-35B-A3B and Opus 4.7 for “Generate an SVG of a flamingo riding a unicycle”:

Qwen3.6-35B-A3B

(transcript (https://gist.github.com/simonw/f1d1ff01c34dda5fdedf684cfc430d92))

Opus 4.7

(transcript (https://gist.github.com/simonw/35121ad5dcf23bf860397a103ae88d50))

I’m giving this one to Qwen too, partly for the excellent SVG comment.

What can we learn from this? The pelican benchmark has always been meant as a joke - it’s mainly a statement on how obtuse and absurd the task of comparing these models is.

The weird thing about that joke is that, for the most part, there has been a direct correlation between the quality of the pelicans produced and the general usefulness of the models. Those first pelicans from October 2024 (https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/) were junk. The more recent entries (https://simonwillison.net/tags/pelican-riding-a-bicycle/) have generally been much, much better - to the point that Gemini 3.1 Pro produces illustrations you could actually use somewhere (https://simonwillison.net/2026/Feb/19/gemini-31-pro/), provided you had a pressing need to illustrate a pelican riding a bicycle.

Today, even that loose connection to utility has been broken. I have enormous respect for Qwen, but I very much doubt that a 21GB quantized version of their latest model is more powerful or useful than Anthropic’s latest proprietary release.

If the thing you need is an SVG illustration of a pelican riding a bicycle though, right now Qwen3.6-35B-A3B running on a laptop is a better bet than Opus 4.7!

    Tags: ai (https://simonwillison.net/tags/ai), generative-ai (https://simonwillison.net/tags/generative-ai), local-llms (https://simonwillison.net/tags/local-llms), llms (https://simonwillison.net/tags/llms), anthropic (https://simonwillison.net/tags/anthropic), claude (https://simonwillison.net/tags/claude), qwen (https://simonwillison.net/tags/qwen), pelican-riding-a-bicycle (https://simonwillison.net/tags/pelican-riding-a-bicycle), lm-studio (https://simonwillison.net/tags/lm-studio)

Reference: https://simonwillison.net/2026/Apr/16/qwen-beats-opus/#atom-everything