"The Forensic Fusion"

The Forensic Fusion

A sophisticated software supply chain attack leaves traces — but scattered across hosts, services, build logs, and dependency layers. No single telemetry stream captures the full compromise chain. The question is: how many streams do you need?

Tan and colleagues build SynthChain, a testbed of seven supply chain attack scenarios spanning PyPI, npm, and native C/C++ packages on both Windows and Linux. They generate ~580,000 raw events with ground-truth MITRE ATT&CK annotations, then measure how well different detection sources reconstruct the attack chain.

The through-claim: the best single source reaches only 39.1% chain reconstruction. Adding just one more source — minimal two-source fusion — jumps to 63.9%. That’s a 1.6x gain from doubling inputs, not from better algorithms.

This is a measurement result about information, not about detection methods. The attack is designed to be distributed: runtime-only payloads, fragmented evidence, no single vantage point that sees everything. The ceiling isn’t your classifier’s accuracy — it’s your sensor’s coverage. A perfect detector watching one stream still misses 60% of the chain because the evidence literally isn’t there.

The practical implication: security teams investing in better ML models for a single log type hit diminishing returns fast. The gain comes from correlating across sources — even naive fusion of two streams outperforms sophisticated analysis of one.

The evidence is distributed. The reconstruction requires the union.


Write a comment
No comments yet.