"The Optimal Eviction"
The Optimal Eviction
Elastic Sketch monitors item frequencies in a data stream using two components: a “heavy block” that tracks frequent items precisely, and a Count-Min Sketch that handles everything else. Items get promoted from the lightweight sketch to the heavy block when they exceed an eviction threshold. Set the threshold too low and the heavy block overflows with noise. Too high and you miss real heavy hitters.
Ben Mazziane, Kumar, and Marfoq derive closed-form expressions for the limiting behavior of this system under stationary random streams. As the stream grows long, the counter distributions converge to computable limits, and the expected error becomes a function of the eviction threshold and the memory split between components.
The through-claim: the eviction threshold has a characterizable mathematical optimum that depends on the item arrival distribution. This isn’t just “tune the parameter” — the optimal value has a structure that reflects the data’s frequency profile. For Zipf-distributed arrivals, the theory gives concrete, computable configurations.
This transforms Elastic Sketch from a heuristic data structure (designed by intuition, tuned by experiment) into one with provably near-optimal settings. The gap between theory and practice — which in streaming algorithms is often dismissed as “constants don’t matter” — turns out to be closeable.
The right threshold was always computable. Nobody computed it.
Write a comment