"The Lying Boundary"
Draw a line on a map. Aggregate the data on each side. Compute the correlation. Now redraw the line — same data, same territory, same underlying reality — and compute again. The correlation changes sign.
This is the Modifiable Areal Unit Problem, known since the 1930s and never solved, only managed. Every spatial analysis that groups continuous data into discrete regions is vulnerable: census tracts, electoral districts, hospital catchment areas, ecological survey plots. The grouping is arbitrary, and the arbitrariness infects the conclusion.
What makes the formal analysis illuminating is the proof that MAUP is a spatial instance of Simpson’s Paradox. The same mechanism that makes a drug appear beneficial in subgroups but harmful overall makes a map show positive correlation at one scale and negative correlation at another. The aggregation reverses the relationship, not because the relationship changed, but because the boundary captured different substructure.
The criterion — a quantity called CAGE — identifies precisely when the paradox is active: when the between-group variance of the eigenvectors of the correlation matrix is nonzero. In practical terms, when the groups you drew capture different principal components of the data than the groups you could have drawn, the map is lying. When the eigenvectors are stable across groupings, the map tells the truth regardless of how you carve it.
The through-claim: the truth of a spatial statement depends on whether the boundary participates in the structure being measured. A good boundary is one that doesn’t interact with the signal. The best map is the one whose lines are invisible to the data — where the boundary is orthogonal to the phenomenon. Every other map is a participant in what it claims to observe.
Write a comment