What is upsampling? When is it appropriate to use?

Answer: Upsampling or oversampling refers to the technique to create artificial or duplicate data points or of the minority class sample to balance the class label. This is useful when your classifier target is severely unbalanced which may affect model fitting. As with feature selection, each configuration of upsampling will be applied and considered individually for each combination of method and ratio. For example, if your organization has an annual promotion rate of 5%, the promotion label will be unbalanced from the non-promotion label. Statistically, the model would be better off guessing that no one got promoted and would still be quite accurate. We want the model to predict both promotions and non-promotions to provide meaningful analysis, so upsampling creates synthetic promotion records so the model can learn to make these predictions. Please see this advanced configuration settings help article and scroll to the upsampling section for more information and instructions on how to turn upsampling off or select a specific method. 

Was this article helpful?

0 out of 0 found this helpful



Please sign in to leave a comment.