Back to App

Confidence Scores

Understanding probabilistic scoring and evidence strength

Why Confidence Scores Matter

Not all evidence is equal. Confidence scores help you understand:

  • How reliable a finding is
  • Whether additional verification is needed
  • How to prioritize investigative resources
  • What to present in court vs. what needs more evidence

Score Types

1. Binary Confirmed (100%)

Absolute certainty from verified databases.

Examples:

  • Address is in known exchange database
  • Transaction confirmed in blockchain (cryptographic proof)

Trust Level: Absolute - can rely on fully

Court Admissibility: High - objective, verifiable

2. Point-Based Scores

Cumulative scoring from multiple indicators.

Structure:

Score = Sum of individual indicator points
Confidence = Score / Maximum Possible Score

Examples:

  • Mixer detection: 9/13 points = 69% confidence
  • Exchange behavior: 7/10 points = 70% confidence

Trust Level: High if ≥ threshold, moderate if borderline

Court Admissibility: Moderate - requires expert explanation

3. Heuristic Likelihood

Qualitative assessment: low, medium, high, very high.

Based on:

  • Multiple behavioral indicators
  • Pattern matching
  • Domain expertise

Trust Level: Variable - requires context

Court Admissibility: Low to moderate - needs corroboration

Detailed Scoring Systems

Mixer Detection (CoinJoin)

Point Allocation

Indicator Threshold Points Reasoning
Input Count ≥100 +4 Extremely high - definitive mixer pattern
≥50 +3 Very high - strong mixer indicator
≥20 +2 High - possible mixer
Output Count ≥100 +4 Extremely high participant count
≥50 +3 High participant count
≥5 +1 Multiple participants
Equal Outputs ≥50 +4 Classic CoinJoin signature
≥10 +2 Possible equal-value mixing
TX Size ≥20KB +2 Very large transaction
≥10KB +1 Large transaction

Maximum Score: 13 points

Confidence Thresholds

IF score >= 10: "very_high" confidence, IS MIXER
IF score >= 7:  "high" confidence, IS MIXER
IF score >= 4:  "medium" confidence, NOT MIXER (below threshold)
IF score < 4:   "low" confidence, NOT MIXER

Why These Thresholds?

  • ≥7 cutoff: Based on analysis of known mixers - rarely score below 7
  • Multiple indicators: Single indicator can't trigger (prevents false positives)
  • Weighted scoring: Strongest indicators (equal outputs) worth more points

Exchange Behavior Detection

Point Allocation

Indicator Threshold Points Reasoning
TX Count ≥1000 +3 Extremely active - typical of exchanges
≥100 +2 Very active address
Total Volume ≥100 BTC +2 High volume typical of exchanges
Balance Ratio <0.01 +2 Hot wallet behavior (low balance)

Maximum Score: 10 points (3+3+2+2 if all max thresholds met)

Confidence Thresholds

IF score >= 7: "high" likelihood of exchange
IF score >= 4: "medium" likelihood
IF score < 4:  "low" likelihood

False Positive Considerations

Other entities that can score high:

  • Mining pools: Also have high TX count
  • Payment processors: High volume, many transactions
  • Gambling sites: Similar deposit/withdrawal patterns

Mitigation: Use confirmed database first, heuristics as backup

Combining Multiple Scores

Agreement Increases Confidence

When multiple methods agree, confidence compounds:

Scenario 1: Confirmed Exchange
  Database: Coinbase (100% confidence)
  
Total Confidence: 100% (use confirmed source)
Scenario 2: Mixer + Behavioral
  CoinJoin Detection: 9/13 (high confidence)
  High input count: 127 inputs
  High output count: 128 outputs
  Large size: 28KB
  
Total Confidence: Very High (multiple indicators agree)

Disagreement Requires Investigation

When methods disagree:

Scenario: Conflicting Signals
  CoinJoin score: 5/13 (medium - below threshold)
  But: 87 equal-value outputs observed
  
Analysis: Investigate further
  - Why low score despite equal outputs?
  - Check if it's a smaller CoinJoin pool
  - May need manual review
  
Action: Mark as "possible mixer, verify"

Statistical Interpretation

What Confidence Means

Confidence Probability Estimate Interpretation
Confirmed (100%) 99.9%+ Virtually certain
Very High (≥77%) 85-95% Almost certain
High (54-76%) 70-85% Likely
Medium (31-53%) 40-70% Possible, needs verification
Low (<31%) <40% Unlikely

Note: These are heuristic estimates, not rigorous statistical probabilities.

Error Margins

All scoring systems have error rates:

  • False Positives: Flagged as mixer/exchange but isn't (Type I error)
  • False Negatives: Missed an actual mixer/exchange (Type II error)

Blockchain Detective tuning: Favors fewer false positives (conservative thresholds)

Using Scores in Legal Proceedings

Strong Evidence (Can Stand Alone)

  • ✅ Confirmed exchange (database match)
  • ✅ Cryptographic transaction proof
  • ✅ Blockchain timestamps and amounts

Supporting Evidence (Needs Context)

  • ⚠️ High-confidence mixer detection
  • ⚠️ Exchange behavior heuristics
  • ⚠️ Pattern analysis findings

Investigative Leads (Not Evidence)

  • 🔍 Medium-confidence detections
  • 🔍 Low-confidence flags
  • 🔍 Behavioral patterns

Expert Testimony Requirements

For scores to be admissible, expert witness should explain:

  1. What each indicator measures
  2. Why thresholds were chosen
  3. Historical accuracy of the method
  4. Potential error sources
  5. Why this specific score is reliable

Practical Decision Making

Resource Allocation Guide

Finding Confidence Resource Level
Confirmed Exchange 100% Immediate full investigation
Mixer (Very High) ≥77% Dedicated analyst time
Exchange (High Heuristic) 54-76% Verify with commercial tools
Pattern (Medium) 31-53% Monitor, gather more data
Any (Low) <31% Document but don't prioritize

Improving Confidence

Additional Verification Methods

To increase confidence in borderline cases:

  • Commercial Tools: Cross-check with Chainalysis, Elliptic
  • Manual Review: Expert examination of transaction
  • Community Sources: Check blockchain explorer comments
  • Temporal Analysis: Look for repeated patterns over time
  • Cluster Analysis: Examine related addresses

Next Steps