Classification Model Performance Metrics

2 minute read


Using the additional below metrics, in addition to the typical error/accuracy, can help when optimizing the model hyperparameters. This is particularly true when working with skewed classes. There are also product-specific use cases to optimize one set of the below parameters over another. For example, if a user story requires that nothing is missed and is willing to have some false positives, then recall is important. Conversely, if a user wants to only make sure that positive predictors are extremely accurate, then precision is important

From a product development perspective, it is important to understand these metrics ahead of time before model training so that the correct hyperparameters can be picked.

Confusion Matrix

Hypothesis class
1 0
Ground-truth class 1 True Positives False Negatives
Type II error
0 False Positives
Type I error
True Negatives



Accurate predictions occur when their average is near the quantity being measured. In the context of classification, it identifies how well the model identified or excluded a condition. \[ ACC = \frac{TP + TN}{P + N} = \frac{TP + TN}{TP + TN + FP + FN} \]


Aka: positive predictive value (PPV) is the fraction of the results that are relevant to the query. Superfluous results go against this score, but missed results do not. \[ PPV = \frac{TP}{TP + FP} = 1 - FDR \]

Recall / Sensitivity

Aka: sensitivity, hit rate, true positive rate (TPR) is the fraction of the results that are successfully retrieved. Missed results go against this core, but superfluous results do not. \[ TPR = \frac{TP}{TP+FN} = 1 - FNR \]


Aka: selectivity, true negative rate (TNR) \[ TNR = \frac{TN}{TN + FP} = 1 - FPR \]


Aka: False positive rate (FPR)

F1 Score

A hybrid metric that is useful when there are unbalanced classes and the same importance is given to precision and recall. \[ F_1 = 2 \cdot \frac{PPV \cdot TPR}{PPV + TPR} \]

F𝜷 Score

More generalized F1 Score where a 𝜷 > 1 emphasizes recall over than precision and a 𝜷 < 1 weights precision as being more important than recall \[ F_\beta = (1 + \beta^2) \cdot \frac{PPV \cdot TPR}{\beta^2 \cdot PPV + TPR} \]


The receiver operating curve is the plot of TPR vs FPR with AUC being the area under the ROC aka AUC/AUROC. This describes how well the model is capable of distinguishing between classes.

auc/roc via Stanford class notes