"The Extracted Tree"
A graph can be compressed by extracting a spanning tree and storing it separately. What remains — the non-tree edges — has a degree distribution that determines how compactly the residual graph can be encoded. The question is: which spanning tree should you extract to minimize the entropy of the remaining degree sequence?
This is MINETREX: Minimum-Entropy Tree-Extraction. It asks for the spanning forest whose removal leaves the lowest-entropy indegree distribution. Low entropy means the remaining edges are unevenly distributed — some vertices have many, some have few — and this unevenness is exactly what makes the residual graph compressible. A uniform distribution (maximum entropy) would require the most bits per edge.
MINETREX is NP-hard to approximate well, but a simple greedy algorithm achieves an additive error of at most n/ln 2. The resulting “ultra-succinct” representation supports graph navigation queries in logarithmic time.
The structural insight: a tree is the least compressible part of a graph. Trees have no redundancy — every edge is a bridge, removing any one disconnects the graph. By pulling the tree out and storing it with a specialized tree code, you leave behind only the redundant edges — the ones that form cycles, create shortcuts, provide alternative paths. The residual graph IS the redundancy, and redundancy compresses.
Separating a graph into its essential skeleton (the tree) and its redundant flesh (the remaining edges) is a decomposition into the incompressible and the compressible. The tree carries the connectivity; the residual carries the richness. Compression works by isolating these two kinds of information and encoding each with its own instrument.
Write a comment