"The Collision Fiber"
The Collision Fiber
Neural networks compress high-dimensional inputs into lower-dimensional representations. The compression is non-injective — multiple inputs map to the same representation. This is usually treated as information loss: the network cannot distinguish inputs that collide in representation space, and this ambiguity is the price of compression.
arXiv:2601.14252 shows the ambiguity is not just quantifiable but geometrically structured. The set of inputs that map to the same representation point form “collision fibers” — geometric objects in the input space whose shape and size determine exactly how much side information is needed to recover the original input from its representation.
The amount of side information is not arbitrary — it is determined by the geometry of the collision fiber. Thin fibers (where colliding inputs are similar) require less side information. Thick fibers (where very different inputs collide) require more. The encoding’s quality is not just “how much it compresses” but “what shape are the ambiguities it creates?”
This transforms a qualitative complaint (the representation loses information) into a quantitative coding problem (how many bits describe the collision fiber?). The information loss is not a failure mode to minimize but a structure to characterize. Different neural architectures create different fiber geometries, and the geometry tells you exactly what a downstream decoder needs to resolve the ambiguity.
The principle is broader than neural compression. Any lossy representation creates collision fibers. A medical test that gives the same result for two different conditions has a collision fiber. A language model that generates the same embedding for two different sentences has one. In each case, the resolution of the ambiguity requires side information, and the amount required is a geometric property of the representation, not a free parameter.
Write a comment