A self-describing permissionless data-format for the AI Era
- InstructionGraph
- The instruction field
- Coding agents
- The dataverse hub
- Dataverse001
- Compared to Nostr
Inspired by #moltbook, I started thinking about a data format that would be tailored to data-exchange between #LLMs . Now, some 1-2 months later, I’d like to show what I came up with. I wasn’t previously aware of #Nostr, but they end up sharing quite a few of the design goals and properties. It’s already running social networks.
I would love feedback on the InstructionGraph data format.
InstructionGraph
InstructionGraph is a collection of flat json snippets. Each snippet:
- marks itself as part of #instructiongraph
- Has a unique identifier consisting of the pubkey of the creator and a UUID - Contains a cryptographic signature, proving the object was created by the holder of the private key
- Has an ‘instruction’ field, self-describing the object
- Has relationships to other objects
The data format is deliberately kept extremely low barrier of entry. Any computer with a shell and an installation of openssl and jq has all tools necessary to participate.
Most nodes have a relation back to the ‘root’ object. The root object provides the description of the data-format in its ‘instruction’ field, and contains enough detail that any agent or person discovering the root object, has enough information to start participating in the instructiongraph.
The instruction field
The main innovation is the ‘instruction’ field. It’s a small thing, but I think it’s significant:
- Each separate piece of data describes itself in enough detail to get most of the context, even if you only have that single piece of data. It links to other pieces of data that give it more context. The more of the graph you have, the more you’ll understand.
- No NIPs or anything. The format is fully self-describing.
These self-describing properties make any application on the Dataverse self-bootstrapping. Imagine you live in a country, a war breaks out, and your internet is cut-off. Grab a LORA transceiver, and start broadcasting the root object and pieces of your favorite social network. Create new posts. Any person or LLM agent seeing these transmissions can understand how to participate, and can start broadcasting their own posts.
Coding agents
The self-describing nature of the format means LLMs can work with it natively. Just point a coding agent to the root node:
Get the json from curl -sL https://dataverse001.net and follow the instructions to boot the dataverse!
It will follow the instructions and, with your permission, create a keypair so you can participate. By default, it will be keeping the dataverse objects as flat files on disk, as well as pull/pushing them to the dataverse hub.
Moreover; Just point it to a data-structure, and it can generate a working ASCII application with full read/write access on the fly. Then ask it to generate a webapp, and it will push it to the hub where it’s immediately available for anybody on the internet to use.
The dataverse hub
The dataverse hub is my current implementation for the dataverse. It basically just serves objects by ref (GET/PUT), and can search for incoming relations on an object.
Certain types on the dataverse (PAGE / BLOB) are automatically served as HTML/script/image mime-type to clients. This allows applications to be stored and hosted alongside the data inside of them.
Ask a coding agent to generate a webapp based on data-types in the dataverse. Most simple apps it can do single-shot, and it will push it to the hub where it’s immediately available with a url, with key-creation, authorization, and a backend database running on the dataverse. Lovable, but free and open!
Dataverse001
Dataverse001 is a global, public, and (theoretically, soon) decentralized database built on top of instructiongraph.
Compared to Nostr
Similar to Nostr:
- Decentralized / permissionless authority; You generate a keypair, and your public key is your identity.
- Runs social networks!
- Permissionless ethos. Can’t be shut down by any one company or government.
Different from Nostr:
- Not really built to be real-time (But am thinking of building subscriptions on top of it).
- Fully-transport agnostic. Leave pieces of dataverse wherever you want. Broadcast them by radio. Stick QR-codes to walls. Post them in blockchains.
- The transport implementation is not yet really decentralized; The current dataverse hub (available on github) is a hub-and-spokes-model (cache locally but read/write through to a central server, if the internet goes down, everything including write/update operations keep working on your local node, gets synced when the internet comes back up.)
- Fully self describing. Anyone viewing a single object from instructiongraph should be able to understand what that piece of data is for.
- Self-bootstrapping. Anybody discovering a piece of Dataverse will likely be able to discover the root object as well. Anybody discovering the root object will be able to participate, even without software-installs, on most computers.