Graph connections

Draft

Precision

Measure how often positive predictions are actually correct.

concept beginner machine-learningmetricsclassification

Hook problem: trust a spam alarm or not

Your model scans 12 emails with fixed fixture subjects. You need to know whether an alarm predicted = spam is trustworthy.

Spam alarm funnel12 evaluated emails become 5 spam alarms before precision applies.

All 12 evaluated emails

Each email has an actual label and a predicted label.

e1

Prize claim now

Obvious prize bait caught by the filter.

actual: spam, predicted: spam, TP: True Positive

e2

Project notes

A normal work message left in the inbox.

actual: not-spam, predicted: not-spam

e3

Receipt attached

A real receipt incorrectly flagged as spam.

actual: not-spam, predicted: spam, FP: False Positive

e4

Account alert

A fake alert slipped into the inbox.

actual: spam, predicted: not-spam

e5

Limited offer

Promotional spam correctly blocked.

actual: spam, predicted: spam, TP: True Positive

e6

Team lunch

A casual team email correctly kept.

actual: not-spam, predicted: not-spam

e7

Password reset

A requested reset email reached the user.

actual: not-spam, predicted: not-spam

e8

Urgent transfer

A scam message was missed by the filter.

actual: spam, predicted: not-spam

e9

Flight update

A useful travel update became a false alarm.

actual: not-spam, predicted: spam, FP: False Positive

e10

Crypto bonus

Suspicious bonus spam correctly caught.

actual: spam, predicted: spam, TP: True Positive

e11

Invoice approved

A business invoice correctly accepted.

actual: not-spam, predicted: not-spam

e12

Verify wallet

A phishing-style wallet email was missed.

actual: spam, predicted: not-spam

Predicted-spam alarms only (5)

These are the five messages used as denominator in precision: e1, e3, e5, e9, e10.

e1

Prize claim now

Obvious prize bait caught by the filter.

actual: spam, predicted: spam, TP: True Positive

e3

Receipt attached

A real receipt incorrectly flagged as spam.

actual: not-spam, predicted: spam, FP: False Positive

e5

Limited offer

Promotional spam correctly blocked.

actual: spam, predicted: spam, TP: True Positive

e9

Flight update

A useful travel update became a false alarm.

actual: not-spam, predicted: spam, FP: False Positive

e10

Crypto bonus

Suspicious bonus spam correctly caught.

actual: spam, predicted: spam, TP: True Positive

Accuracy says:

TP+TNTP+FP+TN+FN=3+412=712\frac{TP + TN}{TP + FP + TN + FN} = \frac{3 + 4}{12} = \frac{7}{12}

That gives 58.3%, but it mixes two question types:

  • Did the model get the whole mail stream mostly correct?
  • Did the model’s spam alarms stay correct?

First naive idea: keep using all examples

If we stay with the global fraction above, those five predicted alarms sit in denominator along with all non-alarms.

Accuracy vs precision contrastAccuracy uses all predictions. Precision uses only predicted-positive examples.
Accuracy

Question: How many of all emails were correct?

(7 / 12 = 58.3%)

Precision

Question: How many spam alarms were correct?

(3 / 5 = 0.6)

Why this hurts

e3 and e4 are both wrong, but they do not affect a trust metric in the same way:

  • e3: actual not-spam, predicted spam (FP) → hurts trust directly.
  • e4: actual spam, predicted not-spam (FN) → hurts recall, not spam-alarm trust.

Core invention: read only predicted positives

For precision, set the denominator to the count of predicted positives:

Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}

This gives the fraction of alarms that were right:

33+2=35=0.6\frac{3}{3 + 2} = \frac{3}{5} = 0.6

Interactive visual demo: predicted-positive trace

Each step below processes only e1, e3, e5, e9, e10 in fixture order:

1,2,3,4,5 → e1, e3, e5, e9, e10

Precision over predicted spam alarms

e1 → TP: Prize claim now. Actual spam, Predicted spam. Cell: TP (True Positive). Running precision: 1/1 = 1.0.

Subject

Prize claim now

Current step

1/5

Trusted alarms

1/1

Running precision

1.0

percent: 100.0%

Final fixture value

3/5 = 0.6

Precision trace over predicted-spam emails
StepEmailActualPredictedCellTrusted alarmsAll alarmsPrecision
1e1spamspamtp (TP)111/1 = 1.0
2e3not-spamspamfp (FP)121/2 = 0.5
3e5spamspamtp (TP)232/3 = 0.667
4e9not-spamspamfp (FP)242/4 = 0.5
5e10spamspamtp (TP)353/5 = 0.6

The static ledger below is still useful when scripts are unavailable.

Precision trace over predicted alarms (no-JS fallback)
StepEmailSubjectActualPredictedCellTrustedAllRunning value
1e1Prize claim nowspamspamTP111/1 = 1
2e3Receipt attachednot-spamspamFP121/2 = 0.5
3e5Limited offerspamspamTP232/3 = 0.667
4e9Flight updatenot-spamspamFP242/4 = 0.5
5e10Crypto bonusspamspamTP353/5 = 0.6

Correctness intuition

Each processed alarm is either TP or FP, so the denominator TP + FP exactly counts all alarm predictions.

The non-alarm examples are outside the denominator:

  • TN + FN outside: e2, e4, e6, e7, e8, e11, e12
  • FN missed positives outside: e4, e8, e12

At this fixture end:

  • TP = 3
  • FP = 2
  • precision = 3 / 5 = 0.6

Implementation sketch

interface Counts {
  tp: number;
  fp: number;
}

function precisionFromCounts(counts: Counts): { numerator: number; denominator: number; value: number | null } {
  const denominator = counts.tp + counts.fp;
  if (denominator === 0) {
    return { numerator: counts.tp, denominator, value: null };
  }
  return { numerator: counts.tp, denominator, value: counts.tp / denominator };
}

Complexity

If confusion-matrix counts already exist:

  • O(1) to compute precision from counts.
  • O(n) + O(1) extra space if you scan one fixture and count tp/fp.

Common confusions

  • Precision is not accuracy. Accuracy uses all 12 emails. Precision uses only 5 alarm predictions.
  • Precision is not recall. Recall asks about FN coverage, which is a different denominator.
  • High precision does not mean no missed spam. Precision can stay at 3/5 while e4,e8,e12 are still missed.
Common confusionsA high precision does not mean no missed spam, and recall is a different question.
Boundary check
ExampleWhy precision still reads 3/5
e3False alarm (FP) — hurts trust.
e4Missed spam (FN) — ignored by precision denominator.
e8Missed spam (FN) — still outside precision denominator.
e12Missed spam (FN) — still outside precision denominator.
Missed spam outside denominator

e4, e8, e12

Zero-denominator display

not available

Zero-denominator convention

When there are no predicted = spam outputs:

  • tp = 0
  • fp = 0
  • denominator tp + fp = 0
  • precision = null (internally)

Rendered value should be:

  • English: not available
  • Chinese: 不可用
Predicted-positive columnThis column has only TP and FP.
TP (3)

Predicted spam and actually spam.

e1

Prize claim now

Obvious prize bait caught by the filter.

actual: spam, predicted: spam, TP: True Positive

e5

Limited offer

Promotional spam correctly blocked.

actual: spam, predicted: spam, TP: True Positive

e10

Crypto bonus

Suspicious bonus spam correctly caught.

actual: spam, predicted: spam, TP: True Positive

FP (2)

Predicted spam and actually not-spam.

e3

Receipt attached

A real receipt incorrectly flagged as spam.

actual: not-spam, predicted: spam, FP: False Positive

e9

Flight update

A useful travel update became a false alarm.

actual: not-spam, predicted: spam, FP: False Positive

Ignored by precision denominator

TN + FN outside denominator (7)

e2, e4, e6, e7, e8, e11, e12

Graph connection

Graph stripPrecision is implemented as a follow-up from confusion-matrix and stays scoped to binary alarms.
confusion-matrix

implemented

precision

implemented

Edges and edges only

From this node:

  • confusion-matrix -> precision (uses)

Exercises

  1. Why does accuracy use all 12 emails but precision use only TP + FP?
  2. Compute the first two running precision values by hand from e1 then e3.
  3. In a dataset with no positive predictions, what should precision render, and why?
  4. Which of e3, e4, and e8 changes the precision denominator?

Graph connections : Precision