What is F1, precision, and recall and what do the scores tell us? Is the F1 score the most important?


For classification machine learning models, you will notice that precision, recall, and F1 weighted test scores for the model are displayed in the Results Summary report for each run. Precision and recall are measures of how well the model scored when compared against the actual label values in the holdout data set (more about that in a sec).  The F1 score is the weighted harmonic mean of precision and recall, which is why it is often used when referring to the strength of a model. The holdout data set is the “testing” portion of the data set created for training and testing the model.  Specifically, precision is what proportion of positive identifications was actually correct, and recall is what proportion of positives were identified correctly. 

  • Precision = (True Positives) / (True Positives + False Positives)
  • Recall = (True Positives) / (True Positives + False Negatives)

Precision and recall are often in tension. That is, improving precision typically reduces recall and vice versa. Because of the trade off between precision and recall it can be difficult to determine if a model is better or worse than another – this is where F1 can be useful. F1 helps quantify the value of trade off between the two metrics ; was giving up 5 points of precision worth the gain of 10 points of recall?  Please see this external article about precision and recall to learn more. 

Was this article helpful?

0 out of 0 found this helpful



Please sign in to leave a comment.