What is upsampling? When should it be leveraged in a machine learning model?


Upsampling is a technique used to address class imbalance in the dataset, particularly in classification tasks. Class imbalance occurs when one class of the target variable is significantly more prevalent than the other(s), leading to biased model performance. In upsampling, the minority class(es) are artificially increased in size by generating synthetic data points or replicating existing ones. This process aims to balance the distribution of classes in the dataset, ensuring that the model is trained on a more representative sample of each class.

Many models commonly created in One AI have class imbalance. For example, in a model predicting voluntary attrition, in most organizations, ~90-95% of employees choose to stay employed, so only 5-10% are voluntarily leaving. This means the model has a lot fewer instances of voluntary attrition to learn from (i.e., an imbalanced dataset). In this case, statistically, the model would be better off guessing that no one voluntarily terminated and would still be quite accurate. We want the model to predict both promotions and non-promotions to provide meaningful analysis, so upsampling creates synthetic promotion records so the model can learn to make these predictions. 

If 50% of your employees terminated and 50% stayed employed, this would be an example of a balanced dataset and upsampling would likely not be necessary or recommended.

Unless you manually select none for upsampling, One AI will try and may utilize upsampling. You can configure the model's upsampling from the augmentation page in the Upsampling section of the One AI configuration (Augmentations >Edit > One AI Configuration > Upsampling > Override).

Upsampling Configuration

Please see this advanced configuration settings help article and scroll to the upsampling section for more information. 

Was this article helpful?

0 out of 0 found this helpful



Please sign in to leave a comment.