What does One AI mean when it marks a column as a “suspicious” variable? What is the default threshold?

Answer: One AI performs a “suspicious column check” that runs a correlation for each column against the target column. The idea is that if the correlation between the column and the target is too high, it is a recoded version of the target and will cause the estimator to essentially "cheat" and use that column to predict the answer. The default threshold for a “definitely cheating” column is set to .85; so, if the column is more than 85% correlated to the target column, it will be automatically dropped. You can override this from the augmentation page in the Global section of the One AI configuration. This field accepts a float between 1 and 0 that controls the aggressiveness for which One AI will label a column as cheating. The cheating value mean threshold limits how much the estimator target can vary within a column before that column is discarded as too predictive. A smaller value increases the likelihood that a column will be dropped. 

Note: Columns will be marked as suspicious in the EDA report but not automatically dropped if they are 70-85% correlated with the target column. 

Was this article helpful?

0 out of 0 found this helpful



Please sign in to leave a comment.