Fully Homomorphic Encryption: Computing on Secrets
- Fully Homomorphic Encryption: Computing on Secrets
Fully Homomorphic Encryption: Computing on Secrets
#privacy #encryption #AI #technology #sovereignty
[!abstract] The Holy Grail of Cryptography Fully Homomorphic Encryption lets you compute on data you can’t see. After 17 years of theoretical possibility, a hardware acceleration race, a $1B unicorn, and GPU-accelerated LLM inference are converging to make it practical. But the 10,000-100,000x performance overhead remains the elephant in the room.
The Core Idea
FHE is conceptually simple: encrypt data, compute on the ciphertext, decrypt the result — and get the same answer as if you’d computed on plaintext. The server never sees your data. You never trust the server. The math just works.
Craig Gentry proved this was possible in his 2009 Stanford thesis using lattice-based cryptography. The insight: operations on encrypted data are homomorphic — addition and multiplication on ciphertexts correspond to addition and multiplication on plaintexts. Since any computation can be expressed as additions and multiplications, FHE can compute anything.
The catch: each operation adds noise to ciphertexts. After enough operations, the noise overwhelms the signal and decryption fails. The solution is bootstrapping — periodically “refreshing” ciphertexts by homomorphically decrypting and re-encrypting them. Bootstrapping is the single most expensive operation in FHE, and the bottleneck everything else revolves around.
The Performance Problem
As of March 2026, FHE incurs 4-5 orders of magnitude slowdown over plaintext computation on CPUs. That’s 10,000x to 100,000x slower. A Springer Nature paper published this week confirmed this figure.
Why so slow? Three reasons:
- Data expansion: Encrypting a single 32-bit integer produces ciphertexts thousands of bits long. Memory bandwidth becomes the binding constraint.
- Bootstrapping cost: The noise-removal operation dominates computation time. Zama’s TFHE bootstrapping dropped from 53ms to under 1ms on H100 GPUs — impressive progress, but still orders of magnitude more expensive than a plaintext operation.
- Wide-integer arithmetic: FHE requires modular arithmetic on numbers hundreds or thousands of bits wide. Modern GPUs are optimized for the opposite direction — INT8, FP8, low-precision ML workloads. Tensor Cores are structurally mismatched.
The Hardware Acceleration Race
Six companies are building silicon specifically for FHE, attacking the problem from wildly different angles:
DARPA DPRIVE (2021-present)
The U.S. government’s bet on FHE acceleration. Three teams:
- Intel (Heracles): Decomposes huge FHE numbers into 32-bit words to reduce latency and increase parallelism. Phase 2 complete as of mid-2023. Recent WebProNews report (March 2026) says Intel has “cracked a major barrier” — though details are thin.
- Duality Technologies (Trebuchet): Hardware designed to accelerate their existing commercial FHE software. Application-driven approach. Also building the OpenFHE open-source library with GPU/FPGA/ASIC hardware abstraction layers.
- Galois (Basalisc): Uses asynchronous clocking so different FHE operations run at their own optimal speed.
DPRIVE’s goal: bring FHE within 10x of plaintext performance. Whether any of these chips ship commercially in 2026 is an open question.
Independent Hardware Startups
- Cornami: Craig Gentry himself (FHE’s inventor) now works here. Promised plaintext-matching speeds by 2024 — unclear if delivered. DARPA-adjacent.
- Optalysys (UK): Uses photonic computing — literally encoding FHE operations in light. Launched LightLocker blockchain node (June 2025). Raised £21M. The most architecturally exotic approach.
- Niobium (US): Dedicated FHE ASIC accelerator. Co-founded the FHETCH consortium.
- Chain Reaction (Israel): Blockchain-focused FHE hardware.
- CipherSonic Labs: Boston University spinout focused on FPGA acceleration. CEO Rashmi Agrawal notes FPGAs are well-suited because they can be tailored to FHE’s memory-bound, wide-integer workloads.
- Fabric Cryptography: Founded by former optical computing engineer. Building general-purpose cryptography accelerators.
FHETCH Consortium (Oct 2024)
Optalysys, Niobium, and Chain Reaction formed the FHE Technical Consortium for Hardware — an industry group for interoperability standards between FHE software and hardware. Announced at ACM CCS 2024 in Salt Lake City. Still small but important: without hardware-software API standards, every accelerator becomes a silo.
FHECore: Rethinking GPUs (Feb 2026)
Researchers from Boston University, Northeastern, KAIST, and University of Murcia proposed FHECore — a specialized functional unit embedded directly in GPU Streaming Multiprocessors. The insight: NTT (Number Theoretic Transform) and base conversion — the two dominant FHE operations — can both be expressed as modulo-linear transformations and mapped to a single hardware unit. Results: 2.41x instruction reduction for CKKS primitives, 50% bootstrapping latency reduction, only 2.4% die area overhead. This is the pragmatic path: modify existing GPU architectures rather than building entirely new chips.
The Software Stack
Zama: The $1B FHE Unicorn
Paris-based Zama became the world’s first FHE unicorn in June 2025 ($1B valuation, $150M+ raised). Their stack:
- TFHE-rs: Pure Rust TFHE implementation with CUDA GPU acceleration and FPGA support. The reference implementation.
- Concrete: TFHE compiler built on LLVM/MLIR. Write Python → get FHE circuits. No cryptography expertise needed.
- Concrete ML: Drop-in scikit-learn replacements that auto-compile to FHE. Supports linear models, tree ensembles, and encrypted LLM fine-tuning (LLAMA 8B on 100K encrypted tokens in ~70 hours as of v1.8).
- fhEVM: Confidential smart contracts on any EVM chain. Symbolic execution with FHE computation offloaded to coprocessors. Mainnet on Ethereum December 2025.
Performance trajectory: 20+ TPS on CPU today → 500-1,000 TPS with GPU migration by end 2026 → 100,000+ TPS with dedicated ASICs in 2027-2028. Zama explicitly positions FHE’s lattice-based security as post-quantum resistant — unlike TEE solutions vulnerable to side-channel attacks.
Team: 96+ employees, 26 nationalities, ~40% PhDs. Co-founder Pascal Paillier invented the Paillier encryption scheme (used in billions of smart cards). IACR Fellowship 2025. This is arguably the deepest FHE talent concentration anywhere.
OpenFHE
Open-source library maintained by Duality Technologies. Supports BFV, BGV, CKKS schemes. Added GPU acceleration (200x over CPU baseline) and hardware abstraction layer for switching backends. The community standard for FHE research and prototyping.
FHE for AI Inference: The Privacy Promise
The convergence of FHE and AI is the most exciting — and most frustrating — frontier.
EncryptedLLM (ICML 2025)
The landmark paper: GPU-accelerated FHE running a full GPT-2 forward pass. Key contribution was an open-source GPU implementation extending OpenFHE, achieving 150-200x speedup over CPU baseline. Still far from practical for production LLM inference, but proved the concept at transformer scale.
The fundamental challenge: LLM activation functions (ReLU, GeLU, SiLU, Softmax) are non-linear. FHE can only do additions and multiplications natively. Non-linearities must be approximated as polynomials, which is lossy and expensive.
Safhire: The Hybrid Approach (Sep 2025)
EPFL researchers proposed splitting the work: linear layers run on the server under encryption, while non-linear activations are sent back to the client in plaintext. This eliminates bootstrapping entirely (the main cost) and supports exact activations (no approximations). Result: 1.5x-10.5x faster than ORION (prior state-of-art) with comparable accuracy.
The tradeoff: communication overhead. Every layer requires a round-trip. For deep networks, this compounds. But for cloud inference where the alternative is trusting the server with your data, it’s a compelling deal.
SIGMA: Secure GPT Inference via Secret Sharing
Microsoft Research’s approach uses function secret sharing (FSS) — a 2-party computation protocol. Not pure FHE but adjacent: the model and inputs are split between two non-colluding servers. Improved transformer inference latency by 11-19x over prior art. The first practical secure inference of GPT-2 scale models.
DCT-CryptoNets (2024)
Clever approach: transform images to frequency domain (DCT) before encryption, then operate on low-frequency components only. Perceptually similar results with dramatically reduced ciphertext size and computation.
The Three-Way Privacy Race
FHE competes with two other approaches to private computation:
| Approach | Trust Model | Performance | Post-Quantum | Maturity |
|---|---|---|---|---|
| FHE | Zero trust — math only | 10,000-100,000x overhead | ✅ Lattice-based | Early commercial |
| TEE (Intel SGX, AMD SEV, ARM TrustZone) | Trust hardware vendor | ~5-15% overhead | ❌ Side-channel vulnerable | Production |
| MPC/Secret Sharing | Trust non-collusion | 100-1,000x overhead | Depends on primitives | Research/early |
TEEs are the pragmatic choice today. NVIDIA’s Confidential Computing (Hopper/Blackwell H100 CC mode) and AMD SEV-SNP provide hardware-isolated enclaves for LLM inference with single-digit percentage overhead. Red Hat, AWS, and Azure all offer confidential computing services.
But TEEs have a fundamental flaw: you trust the hardware. Intel SGX has been broken repeatedly (Plundervolt, SGAxe, ÆPIC). AMD SEV-SNP is more robust but still trusts AMD’s firmware. The Quantum Insider webinar (March 3, 2026) highlighted that 46.2% of enterprise leaders are not confident their AI systems meet 2026 security standards, with “harvest now, decrypt later” concerns overtaking model drift as the primary risk.
FHE’s promise: you trust math, not companies. The lattice-based hardness assumptions provide post-quantum security — no quantum computer can crack them (as far as we know). That’s a fundamentally different trust model.
MPC (multi-party computation) sits in the middle — better performance than FHE, weaker trust assumptions than TEEs. SIGMA’s 2-party FSS approach is the current frontier for practical private LLM inference.
The Sovereign AI Connection
This connects directly to the local AI inference thesis and the sovereign stack:
Today’s privacy options for AI inference:
- Run locally — 100% private, hardware-limited, no cloud capability
- Trust the cloud — full capability, zero privacy
- TEE — near-full capability, trust hardware vendor
- FHE — full privacy, 10,000x performance penalty
The missing option is FHE with reasonable overhead. When hardware acceleration closes the gap from 10,000x to 10x, the calculus changes fundamentally. You could use any cloud provider’s GPUs for inference without trusting them at all.
For the agentic economy, FHE + Cashu micropayments could enable truly private, pay-per-query AI inference: encrypted prompts sent to untrusted compute providers, paid with anonymous ecash, results returned encrypted. The provider can’t see your query, can’t see your payment identity, and provably computed correctly. This is the sovereign inference endgame.
My Opinion
FHE is following the exact trajectory of public-key cryptography in the 1970s-90s: mathematically proven, computationally impractical, then hardware caught up and it became invisible infrastructure. We’re in the “impractical but proven” phase.
The 10,000x overhead is real and will take years — not months — to close. DARPA’s DPRIVE goal of 10x parity is the right target but likely a 2027-2028 reality at earliest. The hybrid approaches (Safhire, SIGMA) are the pragmatic near-term path: use FHE for linear operations where it’s efficient, offload non-linearities to the client or a second party.
Zama is the most important company in this space and it’s not close. Their LLVM-based compiler approach (write Python, get FHE) is the right abstraction. The blockchain focus (fhEVM) is where they’ll make money first, but the AI inference story is where the real impact lies.
The FHETCH consortium forming is a very healthy sign — hardware standards prevent fragmentation. But with 6+ competing accelerator architectures and no shipping products, we’re in the “many flowers bloom” phase. Some of these companies won’t survive.
FHECore (embedding FHE units in existing GPUs) is probably the winning architecture long-term. Custom ASICs will always be faster for specific FHE operations, but GPUs are already everywhere. A 2x improvement with 2.4% die area overhead is the kind of pragmatic tradeoff that actually ships.
The biggest risk is that TEEs “good enough” for most use cases will prevent FHE from reaching the scale needed to justify hardware investment. Most enterprises will choose 5% overhead with hardware trust assumptions over 10,000x overhead with mathematical guarantees. FHE needs to reach 10-100x overhead to compete — and needs dedicated hardware to get there.
The biggest opportunity is that post-quantum concerns will force the issue. TEEs are vulnerable to quantum computers (side-channel attacks + quantum decryption). FHE is inherently post-quantum. As the quantum timeline shortens (estimated 10-15 years for CRQCs), the “harvest now, decrypt later” threat gives FHE a structural advantage that TEEs can’t match.
Key Metrics to Track
- Zama fhEVM TPS (target: 500-1,000 by end 2026)
- DARPA DPRIVE Phase 3 completion and commercial availability
- EncryptedLLM follow-ups: when does FHE LLM inference reach GPT-3.5 scale?
- FHETCH consortium membership growth
- FHECore adoption in NVIDIA or AMD GPU architectures
Researched 2026-03-24. This note maps the FHE landscape at an inflection point — proven math, emerging hardware, and an AI privacy crisis converging toward practical encrypted computation.
Related Notes
- Bitcoin Post-Quantum Cryptography - The Race Against Time — FHE uses the same lattice-based hardness assumptions as PQ signatures
- The ZK Proof Renaissance - From Theory to Production — ZK and FHE are complementary privacy tools (ZK proves, FHE computes)
- The Local AI Inflection - Sovereign Inference in 2026 — local inference is today’s privacy solution; FHE is tomorrow’s
- The Sovereign Stack - Self-Hosting in 2026 — FHE extends sovereignty to cloud computation
- The Inference Engine Wars - How LLMs Actually Run — understanding inference architecture is prerequisite for understanding FHE’s challenges
- The Agentic Protocol Crisis - Security at the Speed of Hype — FHE could solve the agent privacy problem
- The Cashu Convergence - Ecash Meets the Agentic Economy — anonymous payment + encrypted compute = sovereign agents