Unicode Explorer using binary search over fetch() HTTP range requests

Unicode Explorer using binary search over fetch() HTTP range requests (https://tools.simonwillison.net/unicode-binary-search) Here's a little prototype I built this morning from my phone as an

Unicode Explorer using binary search over fetch() HTTP range requests (https://tools.simonwillison.net/unicode-binary-search)

Here’s a little prototype I built this morning from my phone as an experiment in HTTP range requests, and a general example of using LLMs to satisfy curiosity.

I’ve been collecting HTTP range tricks (https://simonwillison.net/tags/http-range-requests/) for a while now, and I decided it would be fun to build something with them myself that used binary search against a large file to do something useful.

So I brainstormed with Claude (https://claude.ai/share/47860666-cb20-44b5-8cdb-d0ebe363384f). The challenge was coming up with a use case for binary search where the data could be naturally sorted in a way that would benefit from binary search.

One of Claude’s suggestions was looking up information about unicode codepoints, which means searching through many MBs of metadata.

I had Claude write me a spec to feed to Claude Code - visible here (https://github.com/simonw/research/pull/90#issue-4001466642) - then kicked off an asynchronous research project (https://simonwillison.net/2025/Nov/6/async-code-research/) with Claude Code for web against my simonw/research (https://github.com/simonw/research) repo to turn that into working code.

Here’s the resulting report and code (https://github.com/simonw/research/tree/main/unicode-explorer-binary-search#readme). One interesting thing I learned is that Range request tricks aren’t compatible with HTTP compression because they mess with the byte offset calculations. I added ‘Accept-Encoding’: ‘identity’ to the fetch() calls but this isn’t actually necessary because Cloudflare and other CDNs automatically skip compression if a content-range header is present.

I deployed the result to my tools.simonwillison.net site (https://tools.simonwillison.net/unicode-binary-search), after first tweaking it to query the data via range requests against a CORS-enabled 76.6MB file in an S3 bucket fronted by Cloudflare.

The demo is fun to play with - type in a single character like ø or a hexadecimal codepoint indicator like 1F99C and it will binary search its way through the large file and show you the steps it takes along the way:

Tags: algorithms (https://simonwillison.net/tags/algorithms), http (https://simonwillison.net/tags/http), research (https://simonwillison.net/tags/research), tools (https://simonwillison.net/tags/tools), unicode (https://simonwillison.net/tags/unicode), ai (https://simonwillison.net/tags/ai), generative-ai (https://simonwillison.net/tags/generative-ai), llms (https://simonwillison.net/tags/llms), ai-assisted-programming (https://simonwillison.net/tags/ai-assisted-programming), http-range-requests (https://simonwillison.net/tags/http-range-requests)
Write a comment
No comments yet.