F1 Score | ReConcept Lab

Hook problem: one score for two questions

In the same 12-email fixture used by precision and recall:

TP = 3
FP = 2
FN = 3
TN = 4

We already trust:

precision = TP / (TP + FP) = 3 / 5 = 0.6
recall = TP / (TP + FN) = 3 / 6 = 0.5

But a reviewer asks for a single balance value that matches the overall behavior.

Metric cardsFrom the fixed confusion-matrix counts, precision is 3/5 and recall is 3/6.

Precision (P)60.0%

Recall (R)50.0%

Arithmetic mean55.0%

not final

F154.5%

computed next step

Count form denominator (2TP + FP + FN)11

First naive idea: take arithmetic mean

A natural first attempt is:

\frac{P + R}{2} = \frac{0.6 + 0.5}{2} = 0.55

This reads nicely as a single number, and it seems fair.

Arithmetic mean firstSame inputs can look close under arithmetic mean, so the next step is balance-specific comparison.

Pair used

P = 0.6, R = 0.5

Arithmetic mean55.0%

F154.5%

Quick read

The average can be similar for different pairs, but the balance score differs.

Where naïve averaging can mislead

Arithmetic mean can look high even when one side is weak.

Compare:

P = 1.0, R = 0.1
P = 0.1, R = 1.0

Both have the same mean 0.55, but both should not be treated equally if one side is almost useless.

Extreme imbalanceA high score on one side can still hide weakness in the other.

High precision, low recall

P = 1.0, R = 0.1

Arithmetic = 55.0%

F1 = 18.2%

Low precision, high recall

P = 0.1, R = 1.0

Arithmetic = 55.0%

F1 = 18.2%

Core idea: harmonic balance

F1 is the harmonic mean of precision and recall:

F_1 = \frac{2PR}{P + R}

For this fixture:

F_1 = \frac{2\cdot 0.6\cdot 0.5}{0.6 + 0.5} = \frac{6}{11} \approx 0.545

This is lower than 0.55 because the weak recall pulls the score down.

Harmonic balanceThe harmonic result moves toward the smaller metric.

Precision

Recall

F154.5%

Balance read

The result is closer to the smaller bar, so one weak side limits score.

Formal and count form

From counts you can avoid recomputing intermediate metrics:

F_1 = \frac{2TP}{2TP + FP + FN}

For this fixture:

\frac{2\cdot3}{2\cdot3 + 2 + 3} = \frac{6}{11} \approx 0.545

Both formulas are equivalent whenever precision and recall are defined.

Formula and count formBoth definitions should compute the same value for valid counts.

Metric form

2PR / (P + R) = 2×0.6×0.5/(0.6+0.5)

0.545

Count form

2×TP / (2×TP + FP + FN) = 2×3 / (2×3 + 2 + 3) = 0.545

Agreement check

Both formulas land on 0.545.

Interactive preset lab

This demo lets you switch preset count cases and watch precision, recall, arithmetic mean, and F1 update in one view.

F1 preset balance lab

Explanation: Using the fixed confusion counts gives a balanced score close to 0.545. (TP=3, FP=2, FN=3, TN=4)

Precision0.6

Numerator = TP, Denominator = TP + FP

Recall0.5

Numerator = TP, Denominator = TP + FN

Arithmetic mean0.55

(P + R) / 2

F10.545

2TP / (2TP + FP + FN)

TP3

FP2

FN3

TN4

Static no-JS fallback:

F1 preset comparison (no-JS fallback)
Preset	TP	FP	FN	Precision	Recall	Arithmetic mean	F1
Fixture	3	2	3	0.6	0.5	0.55	0.545
High precision, low recall	1	0	9	1.0	0.1	0.55	0.182
Low precision, high recall	1	9	0	0.1	1.0	0.55	0.182
Both strong	9	1	1	0.9	0.9	0.9	0.9
No positive-side evidence	0	0	0	not available	not available	not available	not available
Errors with no true positives	0	2	3	0	0	0	0

Arithmetic mean visibility in the demo

The preset panel shows:

precision = TP / (TP + FP)
recall = TP / (TP + FN)
arithmetic mean = (P + R) / 2
F1 = 2TP / (2TP + FP + FN)

Unavailability and zero branchesBranching explicitly separates unavailable from zero.

F1 branch table
Case	TP	FP	FN	2TP + FP + FN	Precision value	Recall value	F1
Fixture	3	2	3	11	0.600	0.500	0.545
No positive-side evidence	0	0	0	0	not available	not available	not available
Errors with no true positives	0	2	3	5	0.000	0.000	0.000

Edge cases and branch behavior

Branch	Condition	F1
unavailable	required source metric missing (for derived metric path)	not available
unavailable	`2TP + FP + FN = 0`	not available
defined 0	`TP = 0` but `FP` or `FN` > 0	0
normal	all other valid counts	`2TP / (2TP + FP + FN)`

The denominator-zero branch prevents 0/0 from rendering as NaN; the TP = 0 branch with FP or FN present deliberately renders a defined zero.

Formula and count formBoth definitions should compute the same value for valid counts.

Metric form

2PR / (P + R) = 2×0.6×0.5/(0.6+0.5)

0.545

Count form

2×TP / (2×TP + FP + FN) = 2×3 / (2×3 + 2 + 3) = 0.545

Agreement check

Both formulas land on 0.545.

Correctness intuition

F1 is only affected by precision and recall; therefore only three counts appear:

FP hurts precision (TP + FP)
FN hurts recall (TP + FN)
TN is not in either metric numerator or denominator

FP/FN/TN effect panelFP and FN affect precision and recall; TN stays outside the formula.

Increases precision denominator only.

Increases recall denominator only.

Not used in precision/recall, so not used in F1.

Fixture

TP=3, FP=2, FN=3, TN=4

Complexity

Once counts are known, this is constant-time:

precision, recall, and F1: O(1)

If you only have examples, scanning to build TP, FP, and FN is O(n) with O(1) extra space, then O(1) for score.

ComplexityCount once, compute many; both are explicit and cheap.

Known counts

O(1) time, O(1) space.

From examples

Scan once to get TP/FP/FN, then O(1).

Implementation sketch

interface Counts {
  tp: number;
  fp: number;
  fn: number;
}

type F1Result = {
  precision: number | null;
  recall: number | null;
  numerator: number;
  denominator: number;
  value: number | null;
};

function f1FromPrecisionRecall(precision: number | null, recall: number | null): F1Result {
  if (precision === null || recall === null) {
    return { precision, recall, numerator: 0, denominator: 0, value: null };
  }
  const numerator = 2 * precision * recall;
  const denominator = precision + recall;
  return {
    precision,
    recall,
    numerator,
    denominator,
    value: denominator === 0 ? 0 : numerator / denominator
  };
}

function f1FromCounts(counts: Counts): F1Result {
  const precision = precisionFromCounts({ tp: counts.tp, fp: counts.fp }).value;
  const recall = recallFromCounts({ tp: counts.tp, fn: counts.fn }).value;
  const numerator = 2 * counts.tp;
  const denominator = 2 * counts.tp + counts.fp + counts.fn;

  if (denominator === 0) {
    return { precision, recall, numerator, denominator, value: null };
  }
  if (counts.tp === 0) {
    return { precision, recall, numerator, denominator, value: 0 };
  }
  return { precision, recall, numerator, denominator, value: numerator / denominator };
}

function formatMetric(value: number | null, locale: "en" | "zh") {
  return value === null ? (locale === "en" ? "not available" : "不可用") : Number(value.toFixed(3)).toString();
}

Common confusions

Common confusionsF1 is neither raw accuracy nor raw recall.

Not accuracy

Accuracy mixes TN and FN differently.

Not raw recall

F1 combines normalized precision and recall.

Not one-sided

Both precision and recall must be strong.

F1 is not accuracy. Accuracy uses all four cells.
F1 is not precision or recall alone. It needs both.
F1 does not use TN directly. TN does not appear in precision/recall.
0 is different from unavailable. If no true positive but there are FP/FN, F1 can be exactly 0.

Graph connection

Node connectionThis node depends on precision and recall.

precision

implemented

recall

implemented

→

f1-score

implemented

In the graph, f1-score is a dependent node from:

precision -> f1-score
recall -> f1-score

Exercises

Compute F1 from fixture counts both as 2PR/(P+R) and 2TP/(2TP + FP + FN).
Why do P = 1.0, R = 0.1 and P = 0.1, R = 1.0 have the same arithmetic mean but lower balance score?
In no-positive-evidence case (TP=FP=FN=0), what should render and why?
Compare FP versus FN:
- adding one FP: TP=3, FP=3, FN=3
- adding one FN: TP=3, FP=2, FN=4
- what is the new F1 in each case?

Unavailability and zero branchesBranching explicitly separates unavailable from zero.

F1 branch table
Case	TP	FP	FN	2TP + FP + FN	Precision value	Recall value	F1
Fixture	3	2	3	11	0.600	0.500	0.545
No positive-side evidence	0	0	0	0	not available	not available	not available
Errors with no true positives	0	2	3	5	0.000	0.000	0.000

Graph connections : F1 Score