Draft
Precision
Measure how often positive predictions are actually correct.
Hook problem: trust a spam alarm or not
Your model scans 12 emails with fixed fixture subjects. You need to know whether an alarm predicted = spam is trustworthy.
All 12 evaluated emails
Each email has an actual label and a predicted label.
Prize claim now
Obvious prize bait caught by the filter.
actual: spam, predicted: spam, TP: True Positive
Project notes
A normal work message left in the inbox.
actual: not-spam, predicted: not-spam
Receipt attached
A real receipt incorrectly flagged as spam.
actual: not-spam, predicted: spam, FP: False Positive
Account alert
A fake alert slipped into the inbox.
actual: spam, predicted: not-spam
Limited offer
Promotional spam correctly blocked.
actual: spam, predicted: spam, TP: True Positive
Team lunch
A casual team email correctly kept.
actual: not-spam, predicted: not-spam
Password reset
A requested reset email reached the user.
actual: not-spam, predicted: not-spam
Urgent transfer
A scam message was missed by the filter.
actual: spam, predicted: not-spam
Flight update
A useful travel update became a false alarm.
actual: not-spam, predicted: spam, FP: False Positive
Crypto bonus
Suspicious bonus spam correctly caught.
actual: spam, predicted: spam, TP: True Positive
Invoice approved
A business invoice correctly accepted.
actual: not-spam, predicted: not-spam
Verify wallet
A phishing-style wallet email was missed.
actual: spam, predicted: not-spam
Predicted-spam alarms only (5)
These are the five messages used as denominator in precision: e1, e3, e5, e9, e10.
Prize claim now
Obvious prize bait caught by the filter.
actual: spam, predicted: spam, TP: True Positive
Receipt attached
A real receipt incorrectly flagged as spam.
actual: not-spam, predicted: spam, FP: False Positive
Limited offer
Promotional spam correctly blocked.
actual: spam, predicted: spam, TP: True Positive
Flight update
A useful travel update became a false alarm.
actual: not-spam, predicted: spam, FP: False Positive
Crypto bonus
Suspicious bonus spam correctly caught.
actual: spam, predicted: spam, TP: True Positive
Accuracy says:
That gives 58.3%, but it mixes two question types:
- Did the model get the whole mail stream mostly correct?
- Did the model’s spam alarms stay correct?
First naive idea: keep using all examples
If we stay with the global fraction above, those five predicted alarms sit in denominator along with all non-alarms.
Question: How many of all emails were correct?
(7 / 12 = 58.3%)
Question: How many spam alarms were correct?
(3 / 5 = 0.6)
Why this hurts
e3 and e4 are both wrong, but they do not affect a trust metric in the same way:
e3: actualnot-spam, predictedspam(FP) → hurts trust directly.e4: actualspam, predictednot-spam(FN) → hurts recall, not spam-alarm trust.
Core invention: read only predicted positives
For precision, set the denominator to the count of predicted positives:
This gives the fraction of alarms that were right:
Interactive visual demo: predicted-positive trace
Each step below processes only e1, e3, e5, e9, e10 in fixture order:
1,2,3,4,5 →
e1, e3, e5, e9, e10
Precision over predicted spam alarms
e1 → TP: Prize claim now. Actual spam, Predicted spam. Cell: TP (True Positive). Running precision: 1/1 = 1.0.
Prize claim now
1/5
1/1
1.0
percent: 100.0%
3/5 = 0.6
| Step | Actual | Predicted | Cell | Trusted alarms | All alarms | Precision | |
|---|---|---|---|---|---|---|---|
| 1 | e1 | spam | spam | tp (TP) | 1 | 1 | 1/1 = 1.0 |
| 2 | e3 | not-spam | spam | fp (FP) | 1 | 2 | 1/2 = 0.5 |
| 3 | e5 | spam | spam | tp (TP) | 2 | 3 | 2/3 = 0.667 |
| 4 | e9 | not-spam | spam | fp (FP) | 2 | 4 | 2/4 = 0.5 |
| 5 | e10 | spam | spam | tp (TP) | 3 | 5 | 3/5 = 0.6 |
The static ledger below is still useful when scripts are unavailable.
| Step | Subject | Actual | Predicted | Cell | Trusted | All | Running value | |
|---|---|---|---|---|---|---|---|---|
| 1 | e1 | Prize claim now | spam | spam | TP | 1 | 1 | 1/1 = 1 |
| 2 | e3 | Receipt attached | not-spam | spam | FP | 1 | 2 | 1/2 = 0.5 |
| 3 | e5 | Limited offer | spam | spam | TP | 2 | 3 | 2/3 = 0.667 |
| 4 | e9 | Flight update | not-spam | spam | FP | 2 | 4 | 2/4 = 0.5 |
| 5 | e10 | Crypto bonus | spam | spam | TP | 3 | 5 | 3/5 = 0.6 |
Correctness intuition
Each processed alarm is either TP or FP, so the denominator TP + FP exactly counts all alarm predictions.
The non-alarm examples are outside the denominator:
TN + FNoutside:e2,e4,e6,e7,e8,e11,e12FNmissed positives outside:e4,e8,e12
At this fixture end:
TP = 3FP = 2precision = 3 / 5 = 0.6
Implementation sketch
interface Counts {
tp: number;
fp: number;
}
function precisionFromCounts(counts: Counts): { numerator: number; denominator: number; value: number | null } {
const denominator = counts.tp + counts.fp;
if (denominator === 0) {
return { numerator: counts.tp, denominator, value: null };
}
return { numerator: counts.tp, denominator, value: counts.tp / denominator };
}
Complexity
If confusion-matrix counts already exist:
O(1)to compute precision from counts.O(n)+O(1)extra space if you scan one fixture and counttp/fp.
Common confusions
- Precision is not accuracy. Accuracy uses all 12 emails. Precision uses only 5 alarm predictions.
- Precision is not recall. Recall asks about
FNcoverage, which is a different denominator. - High precision does not mean no missed spam. Precision can stay at
3/5whilee4,e8,e12are still missed.
| Example | Why precision still reads 3/5 |
|---|---|
| e3 | False alarm (FP) — hurts trust. |
| e4 | Missed spam (FN) — ignored by precision denominator. |
| e8 | Missed spam (FN) — still outside precision denominator. |
| e12 | Missed spam (FN) — still outside precision denominator. |
e4, e8, e12
not available
Zero-denominator convention
When there are no predicted = spam outputs:
tp = 0fp = 0- denominator
tp + fp = 0 precision = null(internally)
Rendered value should be:
- English: not available
- Chinese: 不可用
Predicted spam and actually spam.
Prize claim now
Obvious prize bait caught by the filter.
actual: spam, predicted: spam, TP: True Positive
Limited offer
Promotional spam correctly blocked.
actual: spam, predicted: spam, TP: True Positive
Crypto bonus
Suspicious bonus spam correctly caught.
actual: spam, predicted: spam, TP: True Positive
Predicted spam and actually not-spam.
Receipt attached
A real receipt incorrectly flagged as spam.
actual: not-spam, predicted: spam, FP: False Positive
Flight update
A useful travel update became a false alarm.
actual: not-spam, predicted: spam, FP: False Positive
TN + FN outside denominator (7)
e2, e4, e6, e7, e8, e11, e12
Graph connection
implemented
implemented
Edges and edges only
From this node:
confusion-matrix -> precision(uses)
Exercises
- Why does accuracy use all 12 emails but precision use only
TP + FP? - Compute the first two running precision values by hand from
e1thene3. - In a dataset with no positive predictions, what should precision render, and why?
- Which of
e3,e4, ande8changes the precision denominator?
Graph connections : Precision