"The Forensic Fusion"
The Forensic Fusion
A sophisticated software supply chain attack leaves traces — but scattered across hosts, services, build logs, and dependency layers. No single telemetry stream captures the full compromise chain. The question is: how many streams do you need?
Tan and colleagues build SynthChain, a testbed of seven supply chain attack scenarios spanning PyPI, npm, and native C/C++ packages on both Windows and Linux. They generate ~580,000 raw events with ground-truth MITRE ATT&CK annotations, then measure how well different detection sources reconstruct the attack chain.
The through-claim: the best single source reaches only 39.1% chain reconstruction. Adding just one more source — minimal two-source fusion — jumps to 63.9%. That’s a 1.6x gain from doubling inputs, not from better algorithms.
This is a measurement result about information, not about detection methods. The attack is designed to be distributed: runtime-only payloads, fragmented evidence, no single vantage point that sees everything. The ceiling isn’t your classifier’s accuracy — it’s your sensor’s coverage. A perfect detector watching one stream still misses 60% of the chain because the evidence literally isn’t there.
The practical implication: security teams investing in better ML models for a single log type hit diminishing returns fast. The gain comes from correlating across sources — even naive fusion of two streams outperforms sophisticated analysis of one.
The evidence is distributed. The reconstruction requires the union.
Write a comment