Draft
F1 Score
Combine precision and recall into one balance score that is penalized by the weaker side.
Hook problem: one score for two questions
In the same 12-email fixture used by precision and recall:
TP = 3FP = 2FN = 3TN = 4
We already trust:
precision = TP / (TP + FP) = 3 / 5 = 0.6recall = TP / (TP + FN) = 3 / 6 = 0.5
But a reviewer asks for a single balance value that matches the overall behavior.
not final
computed next step
First naive idea: take arithmetic mean
A natural first attempt is:
This reads nicely as a single number, and it seems fair.
P = 0.6, R = 0.5
The average can be similar for different pairs, but the balance score differs.
Where naïve averaging can mislead
Arithmetic mean can look high even when one side is weak.
Compare:
P = 1.0, R = 0.1P = 0.1, R = 1.0
Both have the same mean 0.55, but both should not be treated equally if one side is almost useless.
P = 1.0, R = 0.1
Arithmetic = 55.0%
F1 = 18.2%
P = 0.1, R = 1.0
Arithmetic = 55.0%
F1 = 18.2%
Core idea: harmonic balance
F1 is the harmonic mean of precision and recall:
For this fixture:
This is lower than 0.55 because the weak recall pulls the score down.
The result is closer to the smaller bar, so one weak side limits score.
Formal and count form
From counts you can avoid recomputing intermediate metrics:
For this fixture:
Both formulas are equivalent whenever precision and recall are defined.
2PR / (P + R) = 2×0.6×0.5/(0.6+0.5)
2×TP / (2×TP + FP + FN) = 2×3 / (2×3 + 2 + 3) = 0.545
Both formulas land on 0.545.
Interactive preset lab
This demo lets you switch preset count cases and watch precision, recall, arithmetic mean, and F1 update in one view.
F1 preset balance lab
Explanation: Using the fixed confusion counts gives a balanced score close to 0.545. (TP=3, FP=2, FN=3, TN=4)
Numerator = TP, Denominator = TP + FP
Numerator = TP, Denominator = TP + FN
(P + R) / 2
2TP / (2TP + FP + FN)
Static no-JS fallback:
| Preset | TP | FP | FN | Precision | Recall | Arithmetic mean | F1 |
|---|---|---|---|---|---|---|---|
| Fixture | 3 | 2 | 3 | 0.6 | 0.5 | 0.55 | 0.545 |
| High precision, low recall | 1 | 0 | 9 | 1.0 | 0.1 | 0.55 | 0.182 |
| Low precision, high recall | 1 | 9 | 0 | 0.1 | 1.0 | 0.55 | 0.182 |
| Both strong | 9 | 1 | 1 | 0.9 | 0.9 | 0.9 | 0.9 |
| No positive-side evidence | 0 | 0 | 0 | not available | not available | not available | not available |
| Errors with no true positives | 0 | 2 | 3 | 0 | 0 | 0 | 0 |
Arithmetic mean visibility in the demo
The preset panel shows:
- precision =
TP / (TP + FP) - recall =
TP / (TP + FN) - arithmetic mean =
(P + R) / 2 F1 = 2TP / (2TP + FP + FN)
| Case | TP | FP | FN | 2TP + FP + FN | Precision value | Recall value | F1 |
|---|---|---|---|---|---|---|---|
| Fixture | 3 | 2 | 3 | 11 | 0.600 | 0.500 | 0.545 |
| No positive-side evidence | 0 | 0 | 0 | 0 | not available | not available | not available |
| Errors with no true positives | 0 | 2 | 3 | 5 | 0.000 | 0.000 | 0.000 |
Edge cases and branch behavior
| Branch | Condition | F1 |
|---|---|---|
| unavailable | required source metric missing (for derived metric path) | not available |
| unavailable | 2TP + FP + FN = 0 | not available |
| defined 0 | TP = 0 but FP or FN > 0 | 0 |
| normal | all other valid counts | 2TP / (2TP + FP + FN) |
The denominator-zero branch prevents 0/0 from rendering as NaN; the TP = 0 branch with FP or FN present deliberately renders a defined zero.
2PR / (P + R) = 2×0.6×0.5/(0.6+0.5)
2×TP / (2×TP + FP + FN) = 2×3 / (2×3 + 2 + 3) = 0.545
Both formulas land on 0.545.
Correctness intuition
F1 is only affected by precision and recall; therefore only three counts appear:
FPhurts precision (TP + FP)FNhurts recall (TP + FN)TNis not in either metric numerator or denominator
Increases precision denominator only.
Increases recall denominator only.
Not used in precision/recall, so not used in F1.
TP=3, FP=2, FN=3, TN=4
Complexity
Once counts are known, this is constant-time:
precision,recall, and F1:O(1)
If you only have examples, scanning to build TP, FP, and FN is O(n) with O(1) extra space, then O(1) for score.
O(1) time, O(1) space.
Scan once to get TP/FP/FN, then O(1).
Implementation sketch
interface Counts {
tp: number;
fp: number;
fn: number;
}
type F1Result = {
precision: number | null;
recall: number | null;
numerator: number;
denominator: number;
value: number | null;
};
function f1FromPrecisionRecall(precision: number | null, recall: number | null): F1Result {
if (precision === null || recall === null) {
return { precision, recall, numerator: 0, denominator: 0, value: null };
}
const numerator = 2 * precision * recall;
const denominator = precision + recall;
return {
precision,
recall,
numerator,
denominator,
value: denominator === 0 ? 0 : numerator / denominator
};
}
function f1FromCounts(counts: Counts): F1Result {
const precision = precisionFromCounts({ tp: counts.tp, fp: counts.fp }).value;
const recall = recallFromCounts({ tp: counts.tp, fn: counts.fn }).value;
const numerator = 2 * counts.tp;
const denominator = 2 * counts.tp + counts.fp + counts.fn;
if (denominator === 0) {
return { precision, recall, numerator, denominator, value: null };
}
if (counts.tp === 0) {
return { precision, recall, numerator, denominator, value: 0 };
}
return { precision, recall, numerator, denominator, value: numerator / denominator };
}
function formatMetric(value: number | null, locale: "en" | "zh") {
return value === null ? (locale === "en" ? "not available" : "不可用") : Number(value.toFixed(3)).toString();
}
Common confusions
Accuracy mixes TN and FN differently.
F1 combines normalized precision and recall.
Both precision and recall must be strong.
- F1 is not accuracy. Accuracy uses all four cells.
- F1 is not precision or recall alone. It needs both.
- F1 does not use TN directly. TN does not appear in precision/recall.
- 0 is different from unavailable. If no true positive but there are FP/FN, F1 can be exactly
0.
Graph connection
implemented
implemented
implemented
In the graph, f1-score is a dependent node from:
precision -> f1-scorerecall -> f1-score
Exercises
- Compute
F1from fixture counts both as2PR/(P+R)and2TP/(2TP + FP + FN). - Why do
P = 1.0, R = 0.1andP = 0.1, R = 1.0have the same arithmetic mean but lower balance score? - In no-positive-evidence case (
TP=FP=FN=0), what should render and why? - Compare
FPversusFN:- adding one
FP:TP=3, FP=3, FN=3 - adding one
FN:TP=3, FP=2, FN=4 - what is the new
F1in each case?
- adding one
| Case | TP | FP | FN | 2TP + FP + FN | Precision value | Recall value | F1 |
|---|---|---|---|---|---|---|---|
| Fixture | 3 | 2 | 3 | 11 | 0.600 | 0.500 | 0.545 |
| No positive-side evidence | 0 | 0 | 0 | 0 | not available | not available | not available |
| Errors with no true positives | 0 | 2 | 3 | 5 | 0.000 | 0.000 | 0.000 |
Graph connections : F1 Score