Maximize insights from One AI by refining your model.
Jump to the Video - Refining ML Models for Beginners
While One AI is designed to take the guesswork out of creating a solid model, you’re unlikely to achieve perfection without some refinement and iteration. By leveraging How to Create a Voluntary Attrition Risk Model in One AI or creating your own model , it’s really not too difficult to achieve a good baseline model.
Let’s pick up from How to Create a Voluntary Attrition Risk Model in One AI and go through a few strategies for maximizing the value derived from One AI.
Suggested strategies to try include;
Expand the train/test data to include more history
If you’re using a Recipe to create your model, the default amount of history used to train and test the model is one year. You can change this setting to see if it will significantly boost performance of the model, but the tradeoff is the further you go back in history, the more your organization differs from its current state. You will need to factor in the age of your organization and consider how much it has changed over the past few years. For an organization that has existed for less than say four years or one that has recently experienced significant change, less history may be better. In our experience at One Model, one or two years is the sweet spot, but for many companies, it’s useful to try both and see which yields the best results.
Steps to expand or reduce the train/test data:
- Select One AI from the top navigation bar
- Find your model in the list and select Edit
- Select Configure One AI Recipe
- Expand the How much history do you want to use to train your predictive model? question
- Change the Number of Training Intervals
- Select the Save icon at the top right
- This will bring you back to Augmentation Configuration
- Scroll to the bottom of the Augmentation Configuration screen and select Save
- Re-run the Augmentation
- Review the results in the Results Summary report.
Create separate models for different parts of the organization
Since behaviors can vary in different parts of an organization, it can be useful to create individual models for those different areas. While knowing attrition risk factors for an entire organization is useful, it’s not the whole story. For example, in a retail company, the attributes that drive attrition for employees who work in the retail outlets might differ significantly from those that drive attrition in an office environment like headquarters. Creating multiple models is made easy by copying completed recipes to new augmentations in One AI.
Steps to create a separate model from an existing augmentation:
- Go to Data menu in the navigation bar
- Select Augmentations
- Select the + Add Machine Learning Model button at the upper right
- Assign a descriptive name for example “Attrition Risk - Finance” or “Attrition Risk - Retail Associates” . You can name this whatever you like but just remember that you can change or modify the name after the augmentation has been created.
- For Using Data From options, select One AI Recipe
- From the Select an Existing Augmentation dropdown, select the original Augmentation and Copy Existing One AI Recipe
- Select Configure One AI Recipe
- Expand the What Headcount do you wish to make predictions for? question (or Define Population in Advanced view)
- Use the Filters control to filter the population
- Select the Save icon
- If you applied setting overrides to the original Augmentation, and you also want those setting overrides applied here, you will need to change those settings now
- Scroll to the bottom of the Augmentation Configuration screen
- Select Save
- Re-run the Augmentation
- Review the results in the Results Summary report
Fill NULL values
Sometimes attributes that contain predictive value are dropped from a model due to too many NULL values. By default, One AI is very conservative about NULL values in that if more than 5% of the population does not have a value for an attribute, it’s dropped from a model. This is a good thing when information is genuinely missing. However, in some situations NULL values are introduced in the data framing steps. This is especially common with Generative Attributes. There are three different ways to mitigate this behavior and potentially improve the performance of your model - 1) apply NULL filling in Generative Attributes, 2) change the NULL drop threshold for all features, or 3) apply NULL filling to individual features.
Apply NULL filling in Generative Attributes
The primary reason Generative Attributes are dropped as features is missing data, or NULL values. Using a Generative Attribute of Promotions as an example, those employees who have never received a Promotion receive a NULL value for that attribute by default. Selecting the Fill NULLs with 0 option replaces those missing values with 0s.
Please note that this option should only be used for Generative Attributes that are sums or counts. It would not be appropriate to fill NULLs with 0s for averages.
Steps to Apply NULL filling in Generative Attributes:
- Go to Data menu in the navigation bar
- Select Augmentations
- Find your Augmentation in the list and select Edit
- Select Configure One AI Recipe
- Expand the Which generative attributes do you want to use in your prediction? question
- Locate the Generative Attribute that you would like to apply NULL filling to in the list
- Click the Edit icon for that Generative Attribute
- Check the checkbox for Fill NULLs with 0
- Save the Generative Attribute by clicking Save
- Check the box next to the Generative Attribute name to include it if it is not already checked
- Scroll to the Would you like to verify that all of the selections you have made are valid? question
- Click Generate Data Statistics
- Find the name of the Generative Attribute in the list
- Verify that the values displayed in the Count and Non-null Count columns match
- Select the Save icon
- Scroll to the bottom of the Augmentation Configuration screen and select Save
- Re-run the Augmentation
- Review the EDA report to verify that the Generative Attribute was not dropped due to NULL values
Change the NULL drop threshold for all features
Changing the NULL drop threshold for all attributes has the advantage of being an easy modification as it’s only one setting and applies to all features.
Steps to change the NULL drop threshold for all features:
- Go to Data menu in the navigation bar
- Select Augmentations
- Select Edit for an existing Augmentation
- Expand the Global section in the right pane
- Select the Override box for the Null Drop Threshold setting
- Enter a decimal value representing the percentage NULL values you want to allow. The default is 0.05 which represents 5%
- Scroll to the bottom of the right pane and select Save
- Re-run the Augmentation
- Review the EDA report to see which attributes were selected as features
Apply NULL filling to individual features
Applying NULL filling to individual features gives you the flexibility to address specific attributes and address them in different ways. For example, an attribute like “Promotions Past Year” might require NULL values be replaced by a 0 whereas for something like “Engagement Score” you might want to fill with the mean value for the entire population. The disadvantage of this approach is that each attribute must be configured separately.
Steps to apply NULL filling to individual features:
- Go to Data menu in the navigation bar
- Select Augmentations
- Find the Augmentation you want to adjust
- Select Edit for an existing Augmentation
- For the Per Column Interventions section - select the Override box and expand it
- Expand Column Interventions
- From the Add Column Intervention dropdown select a column for which you would like to apply NULL filling
- Select the + button
- Expand the newly created section for that column
- From the Droppable selector - select not droppable
- For the Null Fill section - select the Override box and expand it
- For the Strategy section - select the Override box
- From the Strategy dropdown select the desired treatment (mean and custom are the most common)
- If you selected custom as a strategy, for the Custom Fill Value section select the Override box
- Enter the value you would like to replace NULLs with (commonly 0)
- If you would like to add NULL filling for another column, repeat steps 5-13 for each
- Scroll to the bottom of the right pane and select Save
- Re-run the Augmentation
- Review the EDA report to see how the features performed
Include more attributes
The One AI pipeline automatically employs dimensionality reduction strategies to optimize dimensionality for machine learning models. This distinction allows you to present One AI with a lot of attributes to choose from. In fact, the more attributes you provide One AI to select as potential features, the better. The best case scenario is that something you hadn’t considered useful to making predictions will in fact be chosen. And you can always exclude features later in the process if necessary. One AI makes selecting more attributes easy with Recipes. Attributes can be either 1) core attributes, or 2) generative attributes.
How to include more core attributes
- Go to Data menu in the navigation bar
- Select Augmentations
- Find your Augmentation in the list and select Edit
- Select Configure One AI Recipe
- Expand the Which core attributes do you want to use in your prediction? question
- From the Scope selector, select either Balanced or Broad
- Balanced will include all tables joined directly to the table that the population metric is created from
- Broad will include all tables joined directly or indirectly to the table that the population metric is created from
- If there are specific attributes you do not want to include, select the X icon for the attribute to remove it from the Included Columns section
- Select the Save icon
- Scroll to the bottom of the Augmentation Configuration screen and select Save
- Re-run the Augmentation
- Review the results in the Results Summary report
How to include more generative attributes
- FGo to Data menu in the navigation bar
- Select Augmentations
- Find your Augmentation in the list and select Edit
- Select Configure One AI Recipe
- Expand the Which generative attributes do you want to use in your prediction? question
- Check the box next to the name of each Generative Attribute you want to include
- To create new Generative Attributes or edit existing, please see the One AI Generative Attributes help article
- Select the Save icon
- Scroll to the bottom of the Augmentation Configuration screen and select Save
- Re-run the Augmentation
- Review the results in the Results Summary report
Perform feature selection interventions
When creating a predictive model, there should be certain attributes you suspect are predictive of the selected outcome, for example tenure is often predictive of attrition risk. If you create a model and an expected attribute is not selected as a feature, that might be a reason for alarm. Alternatively, there might be attributes in your data that you don’t want to include in your model, such as ethnicity or salary. By “grooming” the features your model uses, you can optimize performance while also producing clear insights.
The Exploratory Data Analysis (EDA) report is an effective tool for feature grooming. It provides a lot of useful information about the attributes you presented to your model, whether data cleaning was performed, which ones were selected as features, and why.
The following sections will cover how to leverage the EDA report to understand your attributes and features, and how to address common scenarios. To better understand the EDA report in depth, please see the EDA Report Introduction.
Review feature selections in the EDA report
Steps to review feature selections in the EDA report:
- Go to Data menu in the navigation bar
- Select Augmentations
- Select Runs for the pertinent augmentation
- Then select the most recent run
- You’re now looking at the EDA Report
- Scroll down to the Variable Status section and note the ones with a label
If there are attributes you expected to be selected that were not, find them in the list and note the description of how they were handled and why they were not selected.
- Click on the name of any attribute to navigate to detailed information about that attribute
- Selecting Toggle details in the lower right of this subsection will display more data
- In this section, selecting any of the tab headers will provide the option to filter the data to the individual labels
- Before making any changes and re-running the model, select the Results Summary tab, scroll down to the Classification Report section, and make note of the f1 score for the positive label to track the effect that any changes you make have on model performance
Increase likelihood of attributes being selected as features
In steps 7-9 listed in the previous section, you learned how to research details about attributes. There are a wide range of things to look for, including;
- Were there many NULL values?
- Are there anomalies in the data?
- How about unexpected values?
- Was the attribute scaled or one hot encoded?
- Do the values differ a lot between the labels?
There are various things you can do to manipulate the data; including requesting data modeling changes from your Customer Success Lead at One Model and changing whether an attribute is continuous (scaled) or categorical (one hot encoded). The most common intervention however is NULL filling.
Learn how to fill NULL values in this section [link to above]
Force attributes to be included as features
By default, One AI performs automated feature selection. If desired, you can override this behavior and manually select the features you want included. You can either select some features and let One AI select the rest, or you can make all of the selections manually.
Follow these steps to force attributes to be included as features:
- Close the EDA Report
- Select Edit for the Augmentation
- For the Per Column Interventions section, select the Override box and expand it
- If you want to completely disable One AI’s automated feature selection, check the Only Use Specified Columns box. Not checking this box will still allow for automatic selection for all features other than the ones you force to be included.
- Expand Column Interventions
- from the Add Column Intervention dropdown, select the column you would like to exclude
- Select the + button
- Expand the newly created section for that column
- Note that not droppable in this section only forces an attribute to be considered as a potential feature. It does not guarantee selection. If this is the desired behavior, select that option and move on to step 13.
- To force selection of features, you will need to know the data type of the attribute. Is it categorical (text) or continuous (numeric or date)?
- For continuous attributes:
- As the Type-Specific Intervention, select Continuous
- Expand Continuous Interventions
- For Force Select, check the Override box and also the checkbox under that
- For categorical attributes:
- As the Type-Specific Intervention, select Categorical
- Expand Categorical Interventions
- In this section you can define how you want each value for the attribute to be treated. For example, if the attribute is gender, you could Exclude Male or you could Select Female. If you want to select or exclude multiple values, enter the values separated by commas.
- Repeat steps 8-11 for any additional features you would like to include
- Scroll to the bottom of the right pane
- Select Save
- Re-run the Augmentation
- Review the EDA report to review feature selection
Exclude particular features if leveraging a Recipe
There are times when you may not want to include certain features. An example of such an attribute is a numeric identifier for locations when you are also including the name of the locations. Fortunately, excluding attributes is easy if you’re using a Recipe.
Steps to exclude particular features if leveraging a Recipe:
- Close the EDA Report
- Select Edit for the Augmentation
- Select Configure One AI Recipe
- Expand the Which core attributes do you want to use in your prediction? question (or Which generative attributes do you want to use in your prediction? if the feature in question is a Generative Attribute)
- Select the X icon for the attribute to remove it from the Included Columns section (or uncheck the selection box next to it in the case of a Generative Attribute)
- Repeat for any additional attributes if necessary
- Select the Save icon
- Scroll to the bottom of the Augmentation Configuration screen and select Save
- Re-run the Augmentation
- Review the results in the Results Summary report
Alternative method to exclude particular features
If you are not leveraging a One AI Recipe for your augmentation, the method for disregarding attributes is different than if you are. The end result is the same.
Steps to exclude particular features if you are not leveraging a Recipe:
- Close the EDA Report
- Select Edit for the Augmentation
- Select the Override box for the Per Column Interventions section and expand it
- Expand Column Interventions
- From the Add Column Intervention dropdown select the column you would like to exclude
- Select the + button
- Expand the newly created section for that column
- From the Droppable selector select always
- Repeat steps 5-18 for any additional features you would like to exclude
- Scroll to the bottom of the right pane and select Save
- Re-run the Augmentation
- Review the EDA report to review feature selection
Try different upsampling options
When predicting outcomes like attrition risk, the classes are often vastly imbalanced. For example, far more people remain employed than terminate in an attrition risk model. When data of this nature is fed to a classification algorithm, the model becomes biased towards the majority class prediction (those who remain employed in our example).
To produce accurate predictions on imbalanced data, upsampling is employed. Upsampling involves creating artificial data for the minority class to “balance” the classes. There are various techniques that can be used to create this data. One AI primarily employs the Synthetic Minority Oversampling TEchnique (SMOTE) to perform upsampling. This external article explains upsampling in more detail: https://towardsdatascience.com/5-techniques-to-work-with-imbalanced-data-in-machine-learning-80836d45d30c
By default, One AI employs the appropriate SMOTE method for the data type with a ratio of 50/50. What this means is that it uses SMOTE to generate enough termination records to exactly match the number of non-termination records. This is often the best approach, but not always. For example, if only 10% of the people in the data terminate, that means the other 40% of termination records sent to the model that make up the 50% for that class are artificial data. This also biases the algorithm. In the case of heavily imbalanced classes, you may want to lower the upsampling ratio.
As with most settings in One AI, upsampling can be manually configured. A few options are described below.
- Go to Data menu in the navigation bar
- Select Augmentations
- Select Edit for an existing Augmentation
- For the Upsampling section, select the Override box and expand it
- Options:
- To disable upsampling, check the disable box
- To change the ratio, enter a decimal value for the percentage of minority class / majority class in the ratio box. The default here is 1.0, which is a 50/50 ratio (equal number). You might want to try .5 here, for half as much upsampling. Note that you can enter multiple values, separated by commas, and One AI will try each.
- In addition to SMOTE, the upsampling method of ADASYN is available as an option. To change the upsampling method, from the method selector select ADASYN. Note that you can select multiple methods and One AI will try each method.
- Scroll to the bottom of the right pane and select Save
- Re-run the Augmentation
- Review the results in the Results Summary report
Comments
0 comments
Please sign in to leave a comment.