Refining a Machine Learning Model in One AI

Maximize insights from One AI by refining your model.

Jump to the Video - Refining ML Models for Beginners

While One AI is designed to take the guesswork out of creating a solid model, you’re unlikely to achieve perfection without some refinement and iteration. By leveraging How to Create a Voluntary Attrition Risk Model in One AI or creating your own model , it’s really not too difficult to achieve a good baseline model.

Let’s pick up from How to Create a Voluntary Attrition Risk Model in One AI and go through a few strategies for maximizing the value derived from One AI.

Suggested strategies to try include;

Expand the train/test data to include more history

If you’re using a Recipe to create your model, the default amount of history used to train and test the model is one year. You can change this setting to see if it will significantly boost performance of the model, but the tradeoff is the further you go back in history, the more your organization differs from its current state. You will need to factor in the age of your organization and consider how much it has changed over the past few years. For an organization that has existed for less than say four years or one that has recently experienced significant change, less history may be better. In our experience at One Model, one or two years is the sweet spot, but for many companies, it’s useful to try both and see which yields the best results.

Steps to expand or reduce the train/test data:

Select One AI from the top navigation bar
Find your model in the list and select Edit
Select Configure One AI Recipe
Expand the How much history do you want to use to train your predictive model? question
Change the Number of Training Intervals
Select the Save icon at the top right
This will bring you back to Augmentation Configuration
Scroll to the bottom of the Augmentation Configuration screen and select Save
Re-run the Augmentation
Review the results in the Results Summary report.

Create separate models for different parts of the organization

Since behaviors can vary in different parts of an organization, it can be useful to create individual models for those different areas. While knowing attrition risk factors for an entire organization is useful, it’s not the whole story. For example, in a retail company, the attributes that drive attrition for employees who work in the retail outlets might differ significantly from those that drive attrition in an office environment like headquarters. Creating multiple models is made easy by copying completed recipes to new augmentations in One AI.

Steps to create a separate model from an existing augmentation:

Go to Data menu in the navigation bar
Select Augmentations
Select the + Add Machine Learning Model button at the upper right
Assign a descriptive name for example “Attrition Risk - Finance” or “Attrition Risk - Retail Associates” . You can name this whatever you like but just remember that you can change or modify the name after the augmentation has been created.
For Using Data From options, select One AI Recipe
From the Select an Existing Augmentation dropdown, select the original Augmentation and Copy Existing One AI Recipe
Select Configure One AI Recipe
Expand the What Headcount do you wish to make predictions for? question (or Define Population in Advanced view)
Use the Filters control to filter the population
Select the Save icon
If you applied setting overrides to the original Augmentation, and you also want those setting overrides applied here, you will need to change those settings now
Scroll to the bottom of the Augmentation Configuration screen
Select Save
Re-run the Augmentation
Review the results in the Results Summary report

Fill NULL values

Sometimes attributes that contain predictive value are dropped from a model due to too many NULL values. By default, One AI is very conservative about NULL values in that if more than 5% of the population does not have a value for an attribute, it’s dropped from a model. This is a good thing when information is genuinely missing. However, in some situations NULL values are introduced in the data framing steps. This is especially common with Generative Attributes. There are three different ways to mitigate this behavior and potentially improve the performance of your model - 1) apply NULL filling in Generative Attributes, 2) change the NULL drop threshold for all features, or 3) apply NULL filling to individual features.

Apply NULL filling in Generative Attributes

The primary reason Generative Attributes are dropped as features is missing data, or NULL values. Using a Generative Attribute of Promotions as an example, those employees who have never received a Promotion receive a NULL value for that attribute by default. Selecting the Fill NULLs with 0 option replaces those missing values with 0s.

Please note that this option should only be used for Generative Attributes that are sums or counts. It would not be appropriate to fill NULLs with 0s for averages.

Steps to Apply NULL filling in Generative Attributes:

Go to Data menu in the navigation bar
Select Augmentations
Find your Augmentation in the list and select Edit
Select Configure One AI Recipe
Expand the Which generative attributes do you want to use in your prediction? question
Locate the Generative Attribute that you would like to apply NULL filling to in the list
Click the Edit icon for that Generative Attribute
Check the checkbox for Fill NULLs with 0
Save the Generative Attribute by clicking Save
Check the box next to the Generative Attribute name to include it if it is not already checked
Scroll to the Would you like to verify that all of the selections you have made are valid? question
Click Generate Data Statistics
Find the name of the Generative Attribute in the list
Verify that the values displayed in the Count and Non-null Count columns match
Select the Save icon
Scroll to the bottom of the Augmentation Configuration screen and select Save
Re-run the Augmentation
Review the EDA report to verify that the Generative Attribute was not dropped due to NULL values

Change the NULL drop threshold for all features

Changing the NULL drop threshold for all attributes has the advantage of being an easy modification as it’s only one setting and applies to all features.

Steps to change the NULL drop threshold for all features:

Go to Data menu in the navigation bar
Select Augmentations
Select Edit for an existing Augmentation
Expand the Global section in the right pane
Select the Override box for the Null Drop Threshold setting
Enter a decimal value representing the percentage NULL values you want to allow. The default is 0.05 which represents 5%
Scroll to the bottom of the right pane and select Save
Re-run the Augmentation
Review the EDA report to see which attributes were selected as features

Apply NULL filling to individual features

Applying NULL filling to individual features gives you the flexibility to address specific attributes and address them in different ways. For example, an attribute like “Promotions Past Year” might require NULL values be replaced by a 0 whereas for something like “Engagement Score” you might want to fill with the mean value for the entire population. The disadvantage of this approach is that each attribute must be configured separately.

Steps to apply NULL filling to individual features:

Go to Data menu in the navigation bar
Select Augmentations
Find the Augmentation you want to adjust
Select Edit for an existing Augmentation
For the Per Column Interventions section - select the Override box and expand it
Expand Column Interventions
From the Add Column Intervention dropdown select a column for which you would like to apply NULL filling
Select the + button
Expand the newly created section for that column
From the Droppable selector - select not droppable
For the Null Fill section - select the Override box and expand it
For the Strategy section - select the Override box
From the Strategy dropdown select the desired treatment (mean and custom are the most common)
If you selected custom as a strategy, for the Custom Fill Value section select the Override box
Enter the value you would like to replace NULLs with (commonly 0)
If you would like to add NULL filling for another column, repeat steps 5-13 for each
Scroll to the bottom of the right pane and select Save
Re-run the Augmentation
Review the EDA report to see how the features performed

Include more attributes

The One AI pipeline automatically employs dimensionality reduction strategies to optimize dimensionality for machine learning models. This distinction allows you to present One AI with a lot of attributes to choose from. In fact, the more attributes you provide One AI to select as potential features, the better. The best case scenario is that something you hadn’t considered useful to making predictions will in fact be chosen. And you can always exclude features later in the process if necessary. One AI makes selecting more attributes easy with Recipes. Attributes can be either 1) core attributes, or 2) generative attributes.

How to include more core attributes

Go to Data menu in the navigation bar
Select Augmentations
Find your Augmentation in the list and select Edit
Select Configure One AI Recipe
Expand the Which core attributes do you want to use in your prediction? question
From the Scope selector, select either Balanced or Broad
1. Balanced will include all tables joined directly to the table that the population metric is created from
2. Broad will include all tables joined directly or indirectly to the table that the population metric is created from
If there are specific attributes you do not want to include, select the X icon for the attribute to remove it from the Included Columns section
Select the Save icon
Scroll to the bottom of the Augmentation Configuration screen and select Save
Re-run the Augmentation
Review the results in the Results Summary report

How to include more generative attributes

FGo to Data menu in the navigation bar
Select Augmentations
Find your Augmentation in the list and select Edit
Select Configure One AI Recipe
Expand the Which generative attributes do you want to use in your prediction? question
Check the box next to the name of each Generative Attribute you want to include
To create new Generative Attributes or edit existing, please see the One AI Generative Attributes help article
Select the Save icon
Scroll to the bottom of the Augmentation Configuration screen and select Save
Re-run the Augmentation
Review the results in the Results Summary report

Perform feature selection interventions

When creating a predictive model, there should be certain attributes you suspect are predictive of the selected outcome, for example tenure is often predictive of attrition risk. If you create a model and an expected attribute is not selected as a feature, that might be a reason for alarm. Alternatively, there might be attributes in your data that you don’t want to include in your model, such as ethnicity or salary. By “grooming” the features your model uses, you can optimize performance while also producing clear insights.

The Exploratory Data Analysis (EDA) report is an effective tool for feature grooming. It provides a lot of useful information about the attributes you presented to your model, whether data cleaning was performed, which ones were selected as features, and why.

The following sections will cover how to leverage the EDA report to understand your attributes and features, and how to address common scenarios. To better understand the EDA report in depth, please see the EDA Report Introduction.

Review feature selections in the EDA report

Steps to review feature selections in the EDA report:

Go to Data menu in the navigation bar
Select Augmentations
Select Runs for the pertinent augmentation
Then select the most recent run
You’re now looking at the EDA Report
Scroll down to the Variable Status section and note the ones with a label

If there are attributes you expected to be selected that were not, find them in the list and note the description of how they were handled and why they were not selected.

Click on the name of any attribute to navigate to detailed information about that attribute
Selecting Toggle details in the lower right of this subsection will display more data
In this section, selecting any of the tab headers will provide the option to filter the data to the individual labels
Before making any changes and re-running the model, select the Results Summary tab, scroll down to the Classification Report section, and make note of the f1 score for the positive label to track the effect that any changes you make have on model performance

Increase likelihood of attributes being selected as features

In steps 7-9 listed in the previous section, you learned how to research details about attributes. There are a wide range of things to look for, including;

Were there many NULL values?
Are there anomalies in the data?
How about unexpected values?
Was the attribute scaled or one hot encoded?
Do the values differ a lot between the labels?

There are various things you can do to manipulate the data; including requesting data modeling changes from your Customer Success Lead at One Model and changing whether an attribute is continuous (scaled) or categorical (one hot encoded). The most common intervention however is NULL filling.

Learn how to fill NULL values in this section [link to above]

Force attributes to be included as features

By default, One AI performs automated feature selection. If desired, you can override this behavior and manually select the features you want included. You can either select some features and let One AI select the rest, or you can make all of the selections manually.

Follow these steps to force attributes to be included as features:

Close the EDA Report
Select Edit for the Augmentation
For the Per Column Interventions section, select the Override box and expand it
If you want to completely disable One AI’s automated feature selection, check the Only Use Specified Columns box. Not checking this box will still allow for automatic selection for all features other than the ones you force to be included.
Expand Column Interventions
from the Add Column Intervention dropdown, select the column you would like to exclude
Select the + button
Expand the newly created section for that column
Note that not droppable in this section only forces an attribute to be considered as a potential feature. It does not guarantee selection. If this is the desired behavior, select that option and move on to step 13.
To force selection of features, you will need to know the data type of the attribute. Is it categorical (text) or continuous (numeric or date)?
For continuous attributes:
1. As the Type-Specific Intervention, select Continuous
2. Expand Continuous Interventions
3. For Force Select, check the Override box and also the checkbox under that
For categorical attributes:
1. As the Type-Specific Intervention, select Categorical
2. Expand Categorical Interventions
3. In this section you can define how you want each value for the attribute to be treated. For example, if the attribute is gender, you could Exclude Male or you could Select Female. If you want to select or exclude multiple values, enter the values separated by commas.
Repeat steps 8-11 for any additional features you would like to include
Scroll to the bottom of the right pane
Select Save
Re-run the Augmentation
Review the EDA report to review feature selection

Exclude particular features if leveraging a Recipe

There are times when you may not want to include certain features. An example of such an attribute is a numeric identifier for locations when you are also including the name of the locations. Fortunately, excluding attributes is easy if you’re using a Recipe.

Steps to exclude particular features if leveraging a Recipe:

Close the EDA Report
Select Edit for the Augmentation
Select Configure One AI Recipe
Expand the Which core attributes do you want to use in your prediction? question (or Which generative attributes do you want to use in your prediction? if the feature in question is a Generative Attribute)
Select the X icon for the attribute to remove it from the Included Columns section (or uncheck the selection box next to it in the case of a Generative Attribute)
Repeat for any additional attributes if necessary
Select the Save icon
Scroll to the bottom of the Augmentation Configuration screen and select Save
Re-run the Augmentation
Review the results in the Results Summary report

Alternative method to exclude particular features

If you are not leveraging a One AI Recipe for your augmentation, the method for disregarding attributes is different than if you are. The end result is the same.

Steps to exclude particular features if you are not leveraging a Recipe:

Close the EDA Report
Select Edit for the Augmentation
Select the Override box for the Per Column Interventions section and expand it
Expand Column Interventions
From the Add Column Intervention dropdown select the column you would like to exclude
Select the + button
Expand the newly created section for that column
From the Droppable selector select always
Repeat steps 5-18 for any additional features you would like to exclude
Scroll to the bottom of the right pane and select Save
Re-run the Augmentation
Review the EDA report to review feature selection

Try different upsampling options

When predicting outcomes like attrition risk, the classes are often vastly imbalanced. For example, far more people remain employed than terminate in an attrition risk model. When data of this nature is fed to a classification algorithm, the model becomes biased towards the majority class prediction (those who remain employed in our example).

To produce accurate predictions on imbalanced data, upsampling is employed. Upsampling involves creating artificial data for the minority class to “balance” the classes. There are various techniques that can be used to create this data. One AI primarily employs the Synthetic Minority Oversampling TEchnique (SMOTE) to perform upsampling. This external article explains upsampling in more detail: https://towardsdatascience.com/5-techniques-to-work-with-imbalanced-data-in-machine-learning-80836d45d30c

By default, One AI employs the appropriate SMOTE method for the data type with a ratio of 50/50. What this means is that it uses SMOTE to generate enough termination records to exactly match the number of non-termination records. This is often the best approach, but not always. For example, if only 10% of the people in the data terminate, that means the other 40% of termination records sent to the model that make up the 50% for that class are artificial data. This also biases the algorithm. In the case of heavily imbalanced classes, you may want to lower the upsampling ratio.

As with most settings in One AI, upsampling can be manually configured. A few options are described below.

Go to Data menu in the navigation bar
Select Augmentations
Select Edit for an existing Augmentation
For the Upsampling section, select the Override box and expand it
Options:
1. To disable upsampling, check the disable box
2. To change the ratio, enter a decimal value for the percentage of minority class / majority class in the ratio box. The default here is 1.0, which is a 50/50 ratio (equal number). You might want to try .5 here, for half as much upsampling. Note that you can enter multiple values, separated by commas, and One AI will try each.
3. In addition to SMOTE, the upsampling method of ADASYN is available as an option. To change the upsampling method, from the method selector select ADASYN. Note that you can select multiple methods and One AI will try each method.
Scroll to the bottom of the right pane and select Save
Re-run the Augmentation
Review the results in the Results Summary report

Refining a Machine Learning Model in One AI

Expand the train/test data to include more history

Create separate models for different parts of the organization

Fill NULL values

Apply NULL filling in Generative Attributes

Change the NULL drop threshold for all features

Apply NULL filling to individual features

Include more attributes

How to include more core attributes

How to include more generative attributes

Perform feature selection interventions

Review feature selections in the EDA report

Increase likelihood of attributes being selected as features

Force attributes to be included as features

Exclude particular features if leveraging a Recipe

Alternative method to exclude particular features

Try different upsampling options

Was this article helpful?

Comments

<%= previousTitle %>

<%= nextTitle %>

In this article:

<%= heading %>

<% if (block.html_url) { %> <%= block.name %> <% } else { %> <%= block.name %> <% } %>

<%= heading %>

<% if (block.html_url) { %> <%= block.name %> <% } else { %> <%= block.name %> <% } %>

Learn Apply Lead

Categories

Toggle navigation menu

<%= category.name %>