Graph connections

Draft

F1 Score

Combine precision and recall into one balance score that is penalized by the weaker side.

concept beginner machine-learningmetricsclassification

Hook problem: one score for two questions

In the same 12-email fixture used by precision and recall:

  • TP = 3
  • FP = 2
  • FN = 3
  • TN = 4

We already trust:

  • precision = TP / (TP + FP) = 3 / 5 = 0.6
  • recall = TP / (TP + FN) = 3 / 6 = 0.5

But a reviewer asks for a single balance value that matches the overall behavior.

Metric cardsFrom the fixed confusion-matrix counts, precision is 3/5 and recall is 3/6.
Precision (P)60.0%
Recall (R)50.0%
Arithmetic mean55.0%

not final

F154.5%

computed next step

Count form denominator (2TP + FP + FN)11

First naive idea: take arithmetic mean

A natural first attempt is:

P+R2=0.6+0.52=0.55\frac{P + R}{2} = \frac{0.6 + 0.5}{2} = 0.55

This reads nicely as a single number, and it seems fair.

Arithmetic mean firstSame inputs can look close under arithmetic mean, so the next step is balance-specific comparison.
Pair used

P = 0.6, R = 0.5

Arithmetic mean55.0%
F154.5%
Quick read

The average can be similar for different pairs, but the balance score differs.

Where naïve averaging can mislead

Arithmetic mean can look high even when one side is weak.

Compare:

  • P = 1.0, R = 0.1
  • P = 0.1, R = 1.0

Both have the same mean 0.55, but both should not be treated equally if one side is almost useless.

Extreme imbalanceA high score on one side can still hide weakness in the other.
High precision, low recall

P = 1.0, R = 0.1

Arithmetic = 55.0%

F1 = 18.2%

Low precision, high recall

P = 0.1, R = 1.0

Arithmetic = 55.0%

F1 = 18.2%

Core idea: harmonic balance

F1 is the harmonic mean of precision and recall:

F1=2PRP+RF_1 = \frac{2PR}{P + R}

For this fixture:

F1=20.60.50.6+0.5=6110.545F_1 = \frac{2\cdot 0.6\cdot 0.5}{0.6 + 0.5} = \frac{6}{11} \approx 0.545

This is lower than 0.55 because the weak recall pulls the score down.

Harmonic balanceThe harmonic result moves toward the smaller metric.
Precision
Recall
F154.5%
Balance read

The result is closer to the smaller bar, so one weak side limits score.

Formal and count form

From counts you can avoid recomputing intermediate metrics:

F1=2TP2TP+FP+FNF_1 = \frac{2TP}{2TP + FP + FN}

For this fixture:

2323+2+3=6110.545\frac{2\cdot3}{2\cdot3 + 2 + 3} = \frac{6}{11} \approx 0.545

Both formulas are equivalent whenever precision and recall are defined.

Formula and count formBoth definitions should compute the same value for valid counts.
Metric form

2PR / (P + R) = 2×0.6×0.5/(0.6+0.5)

0.545
Count form

2×TP / (2×TP + FP + FN) = 2×3 / (2×3 + 2 + 3) = 0.545

Agreement check

Both formulas land on 0.545.

Interactive preset lab

This demo lets you switch preset count cases and watch precision, recall, arithmetic mean, and F1 update in one view.

F1 preset balance lab

Explanation: Using the fixed confusion counts gives a balanced score close to 0.545. (TP=3, FP=2, FN=3, TN=4)

Precision0.6

Numerator = TP, Denominator = TP + FP

Recall0.5

Numerator = TP, Denominator = TP + FN

Arithmetic mean0.55

(P + R) / 2

F10.545

2TP / (2TP + FP + FN)

TP3
FP2
FN3
TN4

Static no-JS fallback:

F1 preset comparison (no-JS fallback)
PresetTPFPFNPrecisionRecallArithmetic meanF1
Fixture3230.60.50.550.545
High precision, low recall1091.00.10.550.182
Low precision, high recall1900.11.00.550.182
Both strong9110.90.90.90.9
No positive-side evidence000not availablenot availablenot availablenot available
Errors with no true positives0230000

Arithmetic mean visibility in the demo

The preset panel shows:

  • precision = TP / (TP + FP)
  • recall = TP / (TP + FN)
  • arithmetic mean = (P + R) / 2
  • F1 = 2TP / (2TP + FP + FN)
Unavailability and zero branchesBranching explicitly separates unavailable from zero.
F1 branch table
CaseTPFPFN2TP + FP + FNPrecision valueRecall valueF1
Fixture323110.6000.5000.545
No positive-side evidence0000not availablenot availablenot available
Errors with no true positives02350.0000.0000.000

Edge cases and branch behavior

BranchConditionF1
unavailablerequired source metric missing (for derived metric path)not available
unavailable2TP + FP + FN = 0not available
defined 0TP = 0 but FP or FN > 00
normalall other valid counts2TP / (2TP + FP + FN)

The denominator-zero branch prevents 0/0 from rendering as NaN; the TP = 0 branch with FP or FN present deliberately renders a defined zero.

Formula and count formBoth definitions should compute the same value for valid counts.
Metric form

2PR / (P + R) = 2×0.6×0.5/(0.6+0.5)

0.545
Count form

2×TP / (2×TP + FP + FN) = 2×3 / (2×3 + 2 + 3) = 0.545

Agreement check

Both formulas land on 0.545.

Correctness intuition

F1 is only affected by precision and recall; therefore only three counts appear:

  • FP hurts precision (TP + FP)
  • FN hurts recall (TP + FN)
  • TN is not in either metric numerator or denominator
FP/FN/TN effect panelFP and FN affect precision and recall; TN stays outside the formula.
FP

Increases precision denominator only.

FN

Increases recall denominator only.

TN

Not used in precision/recall, so not used in F1.

Fixture

TP=3, FP=2, FN=3, TN=4

Complexity

Once counts are known, this is constant-time:

  • precision, recall, and F1: O(1)

If you only have examples, scanning to build TP, FP, and FN is O(n) with O(1) extra space, then O(1) for score.

ComplexityCount once, compute many; both are explicit and cheap.
Known counts

O(1) time, O(1) space.

From examples

Scan once to get TP/FP/FN, then O(1).

Implementation sketch

interface Counts {
  tp: number;
  fp: number;
  fn: number;
}

type F1Result = {
  precision: number | null;
  recall: number | null;
  numerator: number;
  denominator: number;
  value: number | null;
};

function f1FromPrecisionRecall(precision: number | null, recall: number | null): F1Result {
  if (precision === null || recall === null) {
    return { precision, recall, numerator: 0, denominator: 0, value: null };
  }
  const numerator = 2 * precision * recall;
  const denominator = precision + recall;
  return {
    precision,
    recall,
    numerator,
    denominator,
    value: denominator === 0 ? 0 : numerator / denominator
  };
}

function f1FromCounts(counts: Counts): F1Result {
  const precision = precisionFromCounts({ tp: counts.tp, fp: counts.fp }).value;
  const recall = recallFromCounts({ tp: counts.tp, fn: counts.fn }).value;
  const numerator = 2 * counts.tp;
  const denominator = 2 * counts.tp + counts.fp + counts.fn;

  if (denominator === 0) {
    return { precision, recall, numerator, denominator, value: null };
  }
  if (counts.tp === 0) {
    return { precision, recall, numerator, denominator, value: 0 };
  }
  return { precision, recall, numerator, denominator, value: numerator / denominator };
}

function formatMetric(value: number | null, locale: "en" | "zh") {
  return value === null ? (locale === "en" ? "not available" : "不可用") : Number(value.toFixed(3)).toString();
}

Common confusions

Common confusionsF1 is neither raw accuracy nor raw recall.
Not accuracy

Accuracy mixes TN and FN differently.

Not raw recall

F1 combines normalized precision and recall.

Not one-sided

Both precision and recall must be strong.

  • F1 is not accuracy. Accuracy uses all four cells.
  • F1 is not precision or recall alone. It needs both.
  • F1 does not use TN directly. TN does not appear in precision/recall.
  • 0 is different from unavailable. If no true positive but there are FP/FN, F1 can be exactly 0.

Graph connection

Node connectionThis node depends on precision and recall.
precision

implemented

+
recall

implemented

f1-score

implemented

In the graph, f1-score is a dependent node from:

  • precision -> f1-score
  • recall -> f1-score

Exercises

  1. Compute F1 from fixture counts both as 2PR/(P+R) and 2TP/(2TP + FP + FN).
  2. Why do P = 1.0, R = 0.1 and P = 0.1, R = 1.0 have the same arithmetic mean but lower balance score?
  3. In no-positive-evidence case (TP=FP=FN=0), what should render and why?
  4. Compare FP versus FN:
    • adding one FP: TP=3, FP=3, FN=3
    • adding one FN: TP=3, FP=2, FN=4
    • what is the new F1 in each case?
Unavailability and zero branchesBranching explicitly separates unavailable from zero.
F1 branch table
CaseTPFPFN2TP + FP + FNPrecision valueRecall valueF1
Fixture323110.6000.5000.545
No positive-side evidence0000not availablenot availablenot available
Errors with no true positives02350.0000.0000.000

Graph connections : F1 Score