Issue 5: What Blockchain Transparency Teaches Us About AI Interpretability

I spend most of my time working on two seemingly unrelated problems.

The first is Bitcoin forensics. Through TrailBit Labs, I research the heuristics that analysts use to trace transactions on Bitcoin's blockchain — methods like common-input-ownership, change address detection, and timing analysis. The question driving this work is simple: when we claim to know who sent bitcoin to whom, how confident should we be?

The second is health AI. Through PLA Health, I'm building an app that gives people AI-powered health recommendations based on their biomarker data. Every recommendation comes with an evidence grade — from L1 (backed by meta-analyses) down to L6 (preclinical research only) — because when the stakes are personal, people deserve to know how much to trust what they're being told.

For a long time, I thought of these as separate interests. One is about money. The other is about health. One deals with cryptographic protocols. The other deals with large language models. But the deeper I've gone into both, the more I've realized they share a common root — and that root is the same challenge now facing AI interpretability.

All three are attempts to make opaque systems transparent. And all three are learning, often painfully, that transparency is harder than it looks.

The Illusion of Transparency

Bitcoin is often described as transparent. Every transaction is recorded on a public ledger. Anyone can look up any address, trace any payment, follow any flow of funds. Compared to traditional banking, where transactions are hidden behind institutional walls, this feels radically open.

But transparency of data is not the same as transparency of meaning.

Yes, you can see that address 1A1zP1... sent 0.5 BTC to address 3J98t1.... But who controls those addresses? Are they the same person moving funds between wallets? Is one of them an exchange? Is the transaction a payment, a consolidation, or an attempt to obscure a trail?

The raw data tells you nothing about intent, identity, or structure. To extract meaning, analysts rely on heuristics — rules of thumb like "inputs to the same transaction probably belong to the same entity" (the Common-Input-Ownership Heuristic) or "the smaller output is probably the change going back to the sender." These heuristics work often enough to be useful. But they also fail in ways that are hard to detect and easy to exploit.

This is the core tension in blockchain forensics: the system is transparent in the sense that all data is visible, but opaque in the sense that the data's meaning is not self-evident. Interpretation requires assumptions, and assumptions can be wrong.

AI has exactly the same problem.

Neural Networks Are Public Ledgers Too

A neural network's weights are, in principle, fully inspectable. Every parameter, every connection, every activation — it's all there in the model files. You can download an open-source LLM and examine every floating-point number that defines it.

But just as a list of Bitcoin transactions doesn't tell you who is paying whom, a matrix of neural network weights doesn't tell you how the model reasons, what it has learned, or why it produces a particular output.

The field of mechanistic interpretability — pioneered in large part by researchers at Anthropic, among others — is essentially trying to do for neural networks what blockchain analysts try to do for transaction graphs: reverse-engineer higher-level meaning from lower-level data.

In blockchain forensics, we cluster addresses into entities. In mechanistic interpretability, researchers identify circuits — groups of neurons that activate together to perform a recognizable function. In both cases, the goal is to move from raw, unintelligible data to a structured understanding of what the system is actually doing.

And in both cases, the methods are heuristic. They work until they don't. The question is always: how do you know when they've stopped working?

Heuristics All the Way Down

This is where my blockchain research has most shaped how I think about AI.

In Bitcoin forensics, the Common-Input-Ownership Heuristic (CIOH) is treated as foundational. Most clustering tools and forensics platforms rely on it. Entire compliance regimes are built on top of it. But when you actually test it empirically — when you look at how often it holds true across different transaction types, time periods, and network conditions — you find it's far less reliable than the industry assumes.

CoinJoin transactions deliberately break CIOH. Multi-party protocols like PayJoin make it impossible to distinguish collaborative transactions from simple payments. Even ordinary transactions can violate the heuristic when users consolidate funds from multiple sources.

The heuristic isn't wrong. It's just not as right as people treat it. And the gap between "usually works" and "always works" is where innocent people get flagged and guilty ones slip through.

AI interpretability faces the same risk. When researchers identify a "truthfulness direction" in a model's activation space, or locate a circuit that appears to handle a specific task, the temptation is to treat these findings as ground truth. But neural networks are vast, and the features we identify may be approximate, context-dependent, or entangled with other behaviors we haven't mapped yet.

The parallel isn't just conceptual. It's structural:

In blockchain: We identify patterns (heuristics) â†’ treat them as reliable â†’ build systems on top of them â†’ discover edge cases where they fail â†’ scramble to patch.

In AI: We identify features (circuits, directions) â†’ treat them as reliable â†’ build alignment tools on top of them â†’ discover edge cases where they fail â†’ scramble to patch.

The lesson from blockchain is that the scrambling phase is inevitable, and it's much less painful if you've been honest about your confidence levels from the start.

Adversarial Pressure Reveals Fragility

Both systems also share a critical dynamic: the presence of adversaries who are actively trying to break your interpretations.

In Bitcoin, privacy-preserving techniques like CoinJoin, atomic swaps, and chain-hopping exist specifically to defeat forensic heuristics. Every time analysts develop a new clustering method, privacy advocates develop countermeasures. The result is an arms race where the reliability of any given heuristic degrades over time as the network adapts.

In AI, adversarial attacks, jailbreaks, and prompt injections play a similar role. They probe the boundaries of what we think we understand about a model's behavior and reveal that our interpretations were more fragile than we assumed.

This adversarial pressure is actually valuable — it's the fastest way to find out where your understanding is incomplete. In blockchain forensics, the existence of CoinJoin forced analysts to develop better methods and be more honest about uncertainty. In AI, jailbreaks and adversarial examples are pushing interpretability researchers to develop more robust theories of how models actually work, rather than settling for surface-level explanations.

But it only works if you treat the failures as information rather than embarrassments.

What "Transparency" Actually Requires

Working across both domains has convinced me that real transparency isn't about making data visible. It's about three things:

1. Validated methods, not assumed ones.

In blockchain forensics, this means empirically testing heuristics before building compliance systems on top of them. In AI interpretability, it means rigorously validating that the features and circuits you've identified actually correspond to the model behaviors you think they do — not just in controlled experiments, but in deployment conditions.

2. Calibrated confidence.

Not every finding deserves equal trust. In PLA Health, I built a 6-tier evidence grading system (L1–L6) because a recommendation backed by a meta-analysis is fundamentally different from one based on a mouse study — even if both "suggest" the same thing. AI interpretability needs something similar: a way to communicate not just what we've found, but how confident we should be in that finding.

When a blockchain analyst says "these addresses belong to the same entity," they should be forced to say with what confidence and based on which heuristics. When an AI interpretability researcher says "this circuit handles deception detection," the same standard should apply.

3. Honest acknowledgment of what we don't know.

The most dangerous state in both fields isn't ignorance — it's false confidence. A blockchain analyst who says "we can trace any transaction" is more dangerous than one who says "we can trace most transactions, but here are the cases where we can't." An AI safety researcher who claims full understanding of a model's decision-making is more dangerous than one who maps what they understand and clearly marks the boundaries.

Building Responsibly in Opaque Systems

I didn't plan for my work to converge like this. I started building TrailBit because I was genuinely curious about Bitcoin's transaction graph. I started building PLA Health because I wanted to solve a personal problem — tracking biomarkers against longevity research rather than standard lab ranges. The connections between blockchain transparency, AI interpretability, and responsible health AI only became clear after I'd been building in all three areas for a while.

But the convergence isn't accidental. If you care about building tools people can trust, you end up asking the same questions regardless of the domain:

What assumptions am I making? How would I know if they were wrong? What happens to the people using my tool if my assumptions fail?

In Bitcoin forensics, wrong assumptions mean innocent people get flagged for money laundering. In health AI, wrong assumptions mean someone follows advice that isn't backed by evidence. In AI interpretability, wrong assumptions mean we think a model is safe when it isn't.

The stakes differ. The methodology shouldn't.

Where This Goes

I don't think blockchain forensics will solve AI interpretability or vice versa. They're different domains with different technical constraints. But I do think practitioners in both fields would benefit from talking to each other more — not about the specific techniques, but about the epistemological challenges they share.

How do you validate an interpretation of a system that's too complex to fully understand? How do you communicate uncertainty to stakeholders who want definitive answers? How do you build responsibly on top of methods you know are imperfect?

These are the questions I keep coming back to, whether I'm analyzing Bitcoin transactions or designing evidence grades for health recommendations. I suspect they're the questions that will define how well we navigate the next decade of AI development.

If you're working on any of these problems — blockchain analytics, AI interpretability, responsible AI, or just trying to build things that genuinely help people — I'd love to hear how you think about them.

Geo Nicolaidis is an independent researcher and builder. He runs TrailBit Labs, a Bitcoin forensics research lab, and PLA Health, a responsible health AI app. He writes about Bitcoin heuristics at Bitcoin Heuristics Field Notes.