Classification Model Performance Metrics

2 minute read

Intro

Using the additional below metrics, in addition to the typical error/accuracy, can help when optimizing the model hyperparameters. This is particularly true when working with skewed classes. There are also product-specific use cases to optimize one set of the below parameters over another. For example, if a user story requires that nothing is missed and is willing to have some false positives, then recall is important. Conversely, if a user wants to only make sure that positive predictors are extremely accurate, then precision is important

From a product development perspective, it is important to understand these metrics ahead of time before model training so that the correct hyperparameters can be picked.

Confusion Matrix

Hypothesis class
1 0
Ground-truth class 1 True Positives False Negatives
Type II error
0 False Positives
Type I error
True Negatives

Metrics

Accuracy

Accurate predictions occur when their average is near the quantity being measured. In the context of classification, it identifies how well the model identified or excluded a condition. \[ ACC = \frac{TP + TN}{P + N} = \frac{TP + TN}{TP + TN + FP + FN} \]

Precision

Aka: positive predictive value (PPV) is the fraction of the results that are relevant to the query. Superfluous results go against this score, but missed results do not. \[ PPV = \frac{TP}{TP + FP} = 1 - FDR \]

Recall / Sensitivity

Aka: sensitivity, hit rate, true positive rate (TPR) is the fraction of the results that are successfully retrieved. Missed results go against this core, but superfluous results do not. \[ TPR = \frac{TP}{TP+FN} = 1 - FNR \]

Specificity

Aka: selectivity, true negative rate (TNR) \[ TNR = \frac{TN}{TN + FP} = 1 - FPR \]

1-Specificity

Aka: False positive rate (FPR)

F1 Score

A hybrid metric that is useful when there are unbalanced classes and the same importance is given to precision and recall. \[ F_1 = 2 \cdot \frac{PPV \cdot TPR}{PPV + TPR} \]

F𝜷 Score

More generalized F1 Score where a 𝜷 > 1 emphasizes recall over than precision and a 𝜷 < 1 weights precision as being more important than recall \[ F_\beta = (1 + \beta^2) \cdot \frac{PPV \cdot TPR}{\beta^2 \cdot PPV + TPR} \]

ROC / AUC

The receiver operating curve is the plot of TPR vs FPR with AUC being the area under the ROC aka AUC/AUROC. This describes how well the model is capable of distinguishing between classes.

auc/roc via Stanford class notes