Confidence Scores
Understanding probabilistic scoring and evidence strength
Why Confidence Scores Matter
Not all evidence is equal. Confidence scores help you understand:
- How reliable a finding is
- Whether additional verification is needed
- How to prioritize investigative resources
- What to present in court vs. what needs more evidence
Score Types
1. Binary Confirmed (100%)
Absolute certainty from verified databases.
Examples:
- Address is in known exchange database
- Transaction confirmed in blockchain (cryptographic proof)
Trust Level: Absolute - can rely on fully
Court Admissibility: High - objective, verifiable
2. Point-Based Scores
Cumulative scoring from multiple indicators.
Structure:
Score = Sum of individual indicator points
Confidence = Score / Maximum Possible Score
Examples:
- Mixer detection: 9/13 points = 69% confidence
- Exchange behavior: 7/10 points = 70% confidence
Trust Level: High if ≥ threshold, moderate if borderline
Court Admissibility: Moderate - requires expert explanation
3. Heuristic Likelihood
Qualitative assessment: low, medium, high, very high.
Based on:
- Multiple behavioral indicators
- Pattern matching
- Domain expertise
Trust Level: Variable - requires context
Court Admissibility: Low to moderate - needs corroboration
Detailed Scoring Systems
Mixer Detection (CoinJoin)
Point Allocation
| Indicator | Threshold | Points | Reasoning |
|---|---|---|---|
| Input Count | ≥100 | +4 | Extremely high - definitive mixer pattern |
| ≥50 | +3 | Very high - strong mixer indicator | |
| ≥20 | +2 | High - possible mixer | |
| Output Count | ≥100 | +4 | Extremely high participant count |
| ≥50 | +3 | High participant count | |
| ≥5 | +1 | Multiple participants | |
| Equal Outputs | ≥50 | +4 | Classic CoinJoin signature |
| ≥10 | +2 | Possible equal-value mixing | |
| TX Size | ≥20KB | +2 | Very large transaction |
| ≥10KB | +1 | Large transaction |
Maximum Score: 13 points
Confidence Thresholds
IF score >= 10: "very_high" confidence, IS MIXER
IF score >= 7: "high" confidence, IS MIXER
IF score >= 4: "medium" confidence, NOT MIXER (below threshold)
IF score < 4: "low" confidence, NOT MIXER
Why These Thresholds?
- ≥7 cutoff: Based on analysis of known mixers - rarely score below 7
- Multiple indicators: Single indicator can't trigger (prevents false positives)
- Weighted scoring: Strongest indicators (equal outputs) worth more points
Exchange Behavior Detection
Point Allocation
| Indicator | Threshold | Points | Reasoning |
|---|---|---|---|
| TX Count | ≥1000 | +3 | Extremely active - typical of exchanges |
| ≥100 | +2 | Very active address | |
| Total Volume | ≥100 BTC | +2 | High volume typical of exchanges |
| Balance Ratio | <0.01 | +2 | Hot wallet behavior (low balance) |
Maximum Score: 10 points (3+3+2+2 if all max thresholds met)
Confidence Thresholds
IF score >= 7: "high" likelihood of exchange
IF score >= 4: "medium" likelihood
IF score < 4: "low" likelihood
False Positive Considerations
Other entities that can score high:
- Mining pools: Also have high TX count
- Payment processors: High volume, many transactions
- Gambling sites: Similar deposit/withdrawal patterns
Mitigation: Use confirmed database first, heuristics as backup
Combining Multiple Scores
Agreement Increases Confidence
When multiple methods agree, confidence compounds:
Scenario 1: Confirmed Exchange
Database: Coinbase (100% confidence)
Total Confidence: 100% (use confirmed source)
Scenario 2: Mixer + Behavioral
CoinJoin Detection: 9/13 (high confidence)
High input count: 127 inputs
High output count: 128 outputs
Large size: 28KB
Total Confidence: Very High (multiple indicators agree)
Disagreement Requires Investigation
When methods disagree:
Scenario: Conflicting Signals
CoinJoin score: 5/13 (medium - below threshold)
But: 87 equal-value outputs observed
Analysis: Investigate further
- Why low score despite equal outputs?
- Check if it's a smaller CoinJoin pool
- May need manual review
Action: Mark as "possible mixer, verify"
Statistical Interpretation
What Confidence Means
| Confidence | Probability Estimate | Interpretation |
|---|---|---|
| Confirmed (100%) | 99.9%+ | Virtually certain |
| Very High (≥77%) | 85-95% | Almost certain |
| High (54-76%) | 70-85% | Likely |
| Medium (31-53%) | 40-70% | Possible, needs verification |
| Low (<31%) | <40% | Unlikely |
Note: These are heuristic estimates, not rigorous statistical probabilities.
Error Margins
All scoring systems have error rates:
- False Positives: Flagged as mixer/exchange but isn't (Type I error)
- False Negatives: Missed an actual mixer/exchange (Type II error)
Blockchain Detective tuning: Favors fewer false positives (conservative thresholds)
Using Scores in Legal Proceedings
Strong Evidence (Can Stand Alone)
- ✅ Confirmed exchange (database match)
- ✅ Cryptographic transaction proof
- ✅ Blockchain timestamps and amounts
Supporting Evidence (Needs Context)
- ⚠️ High-confidence mixer detection
- ⚠️ Exchange behavior heuristics
- ⚠️ Pattern analysis findings
Investigative Leads (Not Evidence)
- 🔍 Medium-confidence detections
- 🔍 Low-confidence flags
- 🔍 Behavioral patterns
Expert Testimony Requirements
For scores to be admissible, expert witness should explain:
- What each indicator measures
- Why thresholds were chosen
- Historical accuracy of the method
- Potential error sources
- Why this specific score is reliable
Practical Decision Making
Resource Allocation Guide
| Finding | Confidence | Resource Level |
|---|---|---|
| Confirmed Exchange | 100% | Immediate full investigation |
| Mixer (Very High) | ≥77% | Dedicated analyst time |
| Exchange (High Heuristic) | 54-76% | Verify with commercial tools |
| Pattern (Medium) | 31-53% | Monitor, gather more data |
| Any (Low) | <31% | Document but don't prioritize |
Improving Confidence
Additional Verification Methods
To increase confidence in borderline cases:
- Commercial Tools: Cross-check with Chainalysis, Elliptic
- Manual Review: Expert examination of transaction
- Community Sources: Check blockchain explorer comments
- Temporal Analysis: Look for repeated patterns over time
- Cluster Analysis: Examine related addresses
Next Steps
- Apply this knowledge: Reading Reports
- Understand what can't be known: Limitations
- Learn best practices: Best Practices