Issue 6: Change Address Detection

Every Bitcoin transaction creates a small puzzle. When you spend from a wallet that holds more than the amount you're sending, the excess has to go somewhere. That somewhere is a change address — an output that returns the leftover value to the sender.

The problem is that Bitcoin doesn't label which output is the payment and which is the change. Both outputs look the same on-chain. They're just amounts sent to addresses. If you're an analyst trying to trace funds, correctly identifying the change output is the difference between following the money forward and wandering off in the wrong direction entirely.

Change address detection is, after the Common-Input-Ownership Heuristic I covered in Issue 1, probably the second most relied-upon technique in Bitcoin forensics. CIOH tells you which addresses belong to the same entity. Change detection tells you which direction funds are moving. Get it wrong, and your entire trace goes sideways — as I discussed in Issue 2 when walking through flow tracing methodology.

What makes this heuristic interesting, and frustrating, is that it relies on wallet implementation details rather than cryptographic properties. CIOH exploits the fact that spending requires private keys. Change detection exploits the fact that wallet developers tend to make similar design choices. When those design choices change, or when someone deliberately breaks the pattern, the heuristic fails silently.

The Pattern Defined

When Alice wants to send 0.3 BTC to Bob, but her wallet holds a single UTXO worth 1.0 BTC, the transaction looks like this:

Input: 1.0 BTC (Alice's UTXO)

Output 1: 0.3 BTC (to Bob)

Output 2: ~0.6997 BTC (back to Alice, minus fees)

Output 2 is the change. The transaction fee accounts for the difference between inputs and outputs. But an analyst looking at this transaction only sees two outputs — 0.3 BTC and 0.6997 BTC — with no indication of which is the payment and which is the change.

Change address detection attempts to resolve this ambiguity using observable properties of the transaction and its outputs. The heuristics fall into several categories, each with different reliability.

Address type matching. Most wallets generate change addresses of the same type as the spending address. If the input comes from a P2SH (pay-to-script-hash) address, the change is likely the output that also goes to a P2SH address. When one output uses the same address format as the input and the other doesn't, that's a signal. This heuristic became less useful as the network transitioned through address types — Legacy (P2PKH), SegWit wrapped (P2SH-P2WPKH), native SegWit (P2WPKH), and now Taproot (P2TR). During transition periods, mixed-type transactions are common and the signal degrades.

Value heuristics. In simple payment scenarios, the change output tends to be the larger of the two. If you're paying for something, you're probably spending less than your full balance. This is the same asymmetry that defines peel chains (Issue 3). But it reverses when someone is emptying a wallet or making a large payment. It also breaks when the payment amount happens to be larger than the change.

Round number analysis. Payments to other people tend to be round numbers — 0.1 BTC, 0.05 BTC, 50,000 satoshis. Change, being the mathematical remainder, tends to be non-round. If one output is exactly 0.5 BTC and the other is 0.49987612 BTC, the round output is probably the payment. This works surprisingly well for human-initiated payments but fails for automated systems, which don't care about round numbers.

Address reuse. If one output goes to an address that has received funds before, and the other goes to a fresh address, the fresh address is likely the change. Wallet software typically generates new addresses for change to improve privacy. But this heuristic inverts when the recipient provides a new address (as they should) and the sender reuses an address (as some older wallets still do).

Spending behavior. This is a post-hoc heuristic. After a transaction confirms, you can look at which output gets spent next. Change tends to be spent sooner — it goes back into the sender's wallet and becomes available for their next transaction. The payment output sits with the recipient, who may not spend it for a while. This heuristic can't help with real-time analysis, but it's useful for historical investigations.

Optimal change heuristic. Proposed by Ergo and others, this heuristic states: if one output is smaller than the smallest input, it is likely the change. The logic is that a wallet selects the minimum set of UTXOs needed to cover the payment. The change output should therefore be smaller than any individual input, because if it weren't, the wallet could have selected fewer inputs. This works well for multi-input transactions but says nothing about single-input spends.

The academic formalization of these heuristics comes from Meiklejohn et al. (2013) and was refined by Nick (2015), Ergo/MÃ¶ser & Narayanan (2017), and the BlockSci framework (Kalodner et al., 2020). Each researcher added constraints and conditions, but the core problem remains: all of these are probabilistic guesses about wallet behavior, not deterministic properties of the protocol.

How to Apply It

Here is how I approach change detection in practice, both in manual analysis and in how I've implemented it in TrailBit.

Step 1: Classify the transaction structure.

Before applying any change heuristic, I categorize the transaction. A single-input, single-output transaction has no change — the entire input goes to the recipient (minus fees). These are unambiguous and don't need analysis. A single-input, two-output transaction is the classic case for change detection. One output is the payment, the other is the change. Most heuristics are designed for this scenario. Multi-input, two-output transactions follow the same logic, but CIOH should be applied to the inputs first. Multi-output transactions with three or more outputs are harder. One might be change, but the others could be batch payments, or the transaction could be a distribution with no change at all. I treat these with lower confidence.

Step 2: Apply heuristics in priority order.

Not all heuristics carry equal weight. I apply them in order of reliability, and I score them rather than treating any single one as definitive.

First, check the optimal change heuristic. If one output is smaller than every input, score it as likely change. This has the strongest theoretical basis because it follows from wallet UTXO selection logic.

Second, check address type matching. If inputs are native SegWit (bc1q...) and one output is also native SegWit while the other is a different type, the matching output is likely change. Score accordingly.

Third, check for round numbers. If one output is a round value in BTC or satoshis, score it as likely the payment. Be careful with denomination — 100,000 satoshis (0.001 BTC) is round in both units.

Fourth, check address reuse. If one output address has appeared before in the blockchain and the other hasn't, score the new address as likely change.

Fifth, if available, check spending timing. If one output was spent within hours and the other sat for days, the quickly-spent output is likely change.

Step 3: Aggregate scores.

I combine heuristic signals rather than relying on any single one. When multiple heuristics agree — the smaller output goes to a new address of the same type as the input and is a non-round number — confidence is high. When they conflict — the larger output goes to a new address but the smaller output is a round number — I flag the transaction as ambiguous.

In TrailBit's analysis pipeline, each heuristic contributes a weighted confidence score, and the change designation is assigned to the output with the highest aggregate score above a minimum threshold. If neither output crosses the threshold, the transaction is marked as indeterminate. I would rather have an honest "I don't know" than a confident wrong answer.

Step 4: Validate against context.

Change detection doesn't happen in isolation. I cross-reference results with cluster data from CIOH. If CIOH already groups one output's address with the sender's cluster, that confirms it as change regardless of other heuristics. I also check whether either output goes to a known entity (exchange, service, payment processor). If so, that output is almost certainly the payment, not the change.

Step 5: Propagate through the trace.

Once change is identified, I follow the non-change output forward to continue the trace (as described in Issue 2). The change output gets folded back into the sender's cluster. Errors here compound: if I misidentify change at hop 3 of a 15-hop trace, every subsequent hop follows the wrong path. This is why I assign confidence scores per hop and degrade overall trace confidence as ambiguous change detections accumulate.

Where It Breaks

Change detection fails in ways that range from mildly annoying to seriously misleading. Here are the failure modes I've encountered most often.

Wallet heterogeneity.

The heuristics assume wallet software behaves predictably. But there are dozens of wallet implementations, and they handle change differently. Some generate change to the same address type as the input. Others have migrated to Taproot for change while still receiving funds at SegWit addresses. Some wallets let users configure change behavior. Hardware wallets, mobile wallets, and full-node wallets each have their own patterns. Assuming uniform behavior across the network is wrong — but every heuristic implicitly does it.

The Taproot convergence problem.

As more wallets adopt Taproot (P2TR) addresses for both sending and receiving, address type matching loses its signal entirely. When both outputs go to P2TR addresses, and the inputs also come from P2TR, the address type heuristic provides zero information. We're heading toward a world where this heuristic is effectively dead for new transactions. That's actually good for user privacy, but it reduces analysts' toolbox.

Intentional change manipulation.

Privacy-aware users and tools can deliberately break change heuristics. Making change outputs round numbers, using different address types for change, delaying change spending, or generating change to previously-used addresses — all of these invert the expected signals. Wasabi Wallet's WabiSabi protocol creates multiple outputs of varying sizes where none is identifiable as change. PayJoin (which I covered in Issue 1) adds recipient inputs, making it impossible to determine which outputs belong to which party.

Batch transactions.

Exchanges and payment processors batch multiple payments into a single transaction. A single transaction might have 50+ outputs, one of which is change. The change is typically the largest output, but not always — and distinguishing one change output from 49 payment outputs using value heuristics alone is unreliable. Some exchanges create change to a known internal address, which helps if you have that information, but most analysts don't.

The no-change transaction.

Sometimes there is no change output at all. If Alice has a UTXO of exactly 0.5 BTC and wants to send 0.4997 BTC (with 0.0003 BTC going to fees), she can construct a transaction with a single output and no change. The fee absorbs the remainder. Analysts applying change detection to these transactions will misidentify the sole payment output as change, sending the trace backward into the sender's cluster instead of forward to the recipient.

Self-transfers and consolidations.

When a user sends funds between their own wallets or consolidates UTXOs, every output might belong to the same entity. There is no "payment" — it's all change, in a sense. Change heuristics assume a two-party transaction model that doesn't apply to single-party operations. I discussed in Issue 2 how these can be mistaken for laundering flows; incorrect change detection amplifies the problem.

The 63% compounding.

Remember Gong et al.'s finding from Issue 1 that CIOH alone has a 63% error rate? Change detection heuristics add another layer of uncertainty. If your clustering is 80% accurate and your change detection is 85% accurate, the combined accuracy for a single hop is 68%. Over five hops, you're at roughly 15% — barely better than guessing. These error rates compound because each hop builds on the assumptions of the previous one.

Visual Example

Let me walk through a scenario I encounter regularly: a trace that forks at a change detection decision point.

I'm investigating a transaction where 2.0 BTC leaves a flagged address. The transaction has one input and two outputs:

Input: 2.0 BTC from flagged address (P2WPKH, native SegWit)

Output A: 0.15 BTC to a P2TR (Taproot) address, never seen before

Output B: 1.8497 BTC to a P2WPKH address, never seen before

Applying heuristics:

The optimal change heuristic says Output A (0.15 BTC) is smaller than the input, so it could be change. But Output B (1.8497 BTC) is also smaller than the input. Both pass.

Address type matching says Output B matches the input type (P2WPKH), so Output B is likely change. Output A is a different type (P2TR).

Round number analysis says 0.15 BTC is a round number, suggesting it's the payment. 1.8497 BTC is not round, suggesting change.

Value analysis says the larger output (B) is likely change.

Three heuristics agree: Output B is change, Output A is the payment. I'd assign moderate-to-high confidence here.

But here's the scenario that keeps me honest. What if the sender recently upgraded their wallet to Taproot and set change to P2TR? Then Output A is the change (going to their new wallet) and Output B is the payment (going to a recipient who still uses SegWit). The round number? Just a coincidence — or the recipient asked for 0.15 BTC specifically. The larger amount? It's the payment, not the change.

In this case, every heuristic points in the wrong direction. The trace follows Output A forward, thinking it's tracking the payment, when actually it's following the sender's change back into their own wallet. The real money flow — the 1.8497 BTC to the recipient — gets ignored.

In TrailBit, I handle this by flagging transactions where heuristics agree but the wallet environment is ambiguous. When I see a P2WPKH-to-P2TR transition, I treat both paths as plausible and trace both, marking the fork with a confidence split. It costs more compute and analyst time, but it's more honest than pretending I know which output is which.

Open Questions

Taproot's impact on heuristic reliability. As Taproot adoption increases, address type matching degrades. What replaces it? Timing analysis and spending behavior become more important, but these require waiting for outputs to be spent — they can't help with real-time tracing. Research quantifying the actual decline in change detection accuracy as a function of Taproot adoption rate would be valuable.

Machine learning for change detection. Several teams have experimented with ML classifiers that combine all available features — value ratios, address types, transaction timing, fee rates, locktime values — into a single prediction. MÃ¶ser & Narayanan (2022) showed this can outperform rule-based heuristics. But ML models are opaque in their own way, which is ironic for a field concerned with transparency. The question is not whether ML performs better on average, but whether analysts can understand and explain why a particular classification was made — especially if it ends up in court.

Interaction with CIOH. Change detection and CIOH are usually applied independently, then combined. But they're interdependent: change detection tells you which output belongs to the sender's cluster, and cluster data can confirm or contradict change detection. A formal framework for jointly optimizing these heuristics, rather than applying them sequentially, could reduce compounding errors. BlockSci allows chaining heuristics but doesn't optimize their interaction.

Ground truth for validation. How do you test change detection accuracy without ground truth? You need labeled data — transactions where you know which output was change — and that's hard to come by. Controlled experiments with known wallets work for small samples. Large-scale validation requires cooperation from wallet providers or exchanges, which raises privacy concerns. The field needs a standard benchmark dataset, and there isn't one.

Economic incentive effects. If change detection becomes unreliable, forensic analysis becomes harder. If forensic analysis becomes harder, the compliance value of blockchain transparency decreases. This could push regulators toward requiring identity disclosure at the protocol level (something like the EU's proposed Transfer of Funds Regulation) rather than relying on after-the-fact analysis. The accuracy of our heuristics has policy implications beyond the technical.

This last point connects to something I've been thinking about for the next issue: the idea that privacy in Bitcoin sometimes comes not from cryptographic techniques but from economic realities that make certain types of analysis impractical. When heuristics like change detection degrade, the cost of accurate tracing increases. At some point, the economics of the analysis matter more than the technology. But that's Issue 7.

Geo Nicolaidis

Builder, TrailBit.io

If you found this useful, subscribe to get the next issue in your inbox. Each issue breaks down a different heuristic used in Bitcoin forensics — what it assumes, where it breaks, and why it matters.

The Pattern Defined

How to Apply It

Where It Breaks

Visual Example

Open Questions

I build tools for Bitcoin forensics and responsible AI