Introduction to Machine Learning in One AI

One AI Machine Learning is an HR-specific no-code machine learning platform. It's a series of processes that perform data cleaning and preparation, algorithm selection, and parameter optimization for machine learning. The result is predictive models leveraging your company's people data without the involvement of a Data Scientist.

Let's begin by understanding how machine learning in One AI works at a high level. 

An algorithm is trained on a set of data where outcomes are known. Examples of outcomes include whether a person terminated employment or remained employed by the company. In addition to outcomes, the data must contain at least 5 (but usually many more) attributes for every row (person in this case). Examples of attributes are tenure, location, and job function. The algorithm looks at the outcome for each row and then considers their attributes. By repeating this analysis of outcomes and attributes for many rows, the algorithm is able to "learn" which attributes drive each outcome (trained algorithm). 

The trained algorithm is referred to as a predictive model, and the attributes it selects as predictive are features. This model can make predictions on a second set of data containing the same attributes where outcomes are not known. Predicted outcomes (predictions) are generated along with information about how the different features influenced those outcomes (drivers).

The Machine Learning Process:

Machine Learning pipelines are complex, and One AI is no exception. However, complex processes can also be intuitive, and One AI aims to guide you through the process from beginning to end. Let’s explore in more detail how One AI does this. 

1. Define the Problem

The first step to machine learning is defining the problem you would like to solve. One AI is a platform designed specifically for People Analytics with a number of the key problems predefined within the tool. This key problem component of One AI is called Recipes. Recipes are a list of HR subjects that can be predicted such as Voluntary Attrition Risk. For more information about Recipes, see the One AI Recipes Overview article. HR subjects for which there is no recipe can also be predicted, but framing the data will be more difficult in those cases.

2. Frame the Data

You will need two data sets to create a model and make predictions. There should be a data set in which outcomes are known and a data set on which to make predictions. The structure of the two sets should match. This sounds straightforward, but in practice, it can be difficult to structure data in this way. Fortunately, with One AI Recipes it makes creating a predictive model as easy as choosing the outcome you want to predict and answering a series of questions. For more information about Recipes, see the One AI Recipes Overview article.

3. Clean the Data

Data cleaning is an automated part of the One AI pipeline. One AI detects the type of each attribute column and applies treatments to ignore bad features and enhance the predictive factor of good ones. These treatments can be overridden for each column, if desired. The methods One AI uses to clean data are as follows:

  • Null Threshold - Attributes containing more than 5% null values are ignored. This threshold can be adjusted.
  • Correlated Features - If multiple attributes contain data that is more or less the same, all but one will be ignored. An example of correlated features is "date of birth" and "age."
  • Scaling - Scaling is the normalization of numeric data on a scale (of 0 to 1 or -1 to 1). It ensures that each numeric feature is weighted the same by placing them on the same scale.
  • One Hot Encoding - One hot encoding is a treatment for text (categorical) columns. It splits the unique values contained in a column into their own binary columns. For example, if a column called temperature contains the values cold, warm, and hot, one hot encoding will create three columns: temperature_cold, temperature_warm, and temperature_hot, with a 0 or a 1 in each cell.
  • Dates - date formatted columns are converted to the difference from the sample date and are scaled.

4. Train and Select the Best Model

By default, One AI takes the clean data from the previous step and trains a selection of models using different configuration options. It then scores the performance of the resulting models and chooses the best-performing combination to use as the final model. Like data cleaning, this is an automated task performed in the One AI machine learning pipeline. The algorithms and settings combinations that are attempted differ by problem type (classification, regression, and time to event). Since training many models takes a significant amount of time, a specific algorithm can be selected in the setting overrides. Doing so will only run that specific algorithm.

To learn more about the algorithms and settings see the One AI Machine Learning Algorithms help article.

5. Review the Results and Refine the Model

One AI provides data and visualizations that allow you to see how your predictive model performed. It sometimes takes multiple cycles of configuration, machine learning, and validation to get to a satisfactory result. For each run of a model, two reports are generated. These reports provide insight into the data cleaning, the model itself, and the predictions generated by the model.

EDA Report

Exploratory Data Analysis (EDA) is a report that tells you about the data cleaning and feature selection One AI performed. It provides detailed information about each of the features as they relate to the outcomes being predicted. The key thing to look at initially is which attributes were selected as features.

The following are examples of refinements that you might want to make to features:

  • Remove features that contain sensitive data, such as ethnicity
  • Remove features that are not easily interpretable, such as numeric identifiers
  • Remove suspicious columns that might be cheating
  • Include features that were dropped due to too many null values

For more information about the EDA Report, please see the EDA Report Introduction article.

Results Summary Report (Modeling Report)

The Results Summary report provides a large amount of detail about the model itself and how it performed. At a high level, these are the things the report contains:

  • Model configuration settings
    • General configuration
    • Feature selection settings
    • Upsampling
  • Prediction details
    • What labels were predicted on, and how many predictions were classified to each?
    • Which label is the positive label?
  • Feature analysis
    • Feature importances
    • SHAP (a method to explain individual predictions) charts such as the beeswarm. More on this in the next section.
  • Model performance statistics
    • Precision, Recall, and F1 scores for each label (in the case of classification problems)
    • Other performance measures depending on problem type and algorithm

Even at a high level, that is a lot of information. The key things to look at first in this report are the prediction details and the model performance statistics. Low performance and/or extremely imbalanced label predictions are things that may require model refinement to address.

For more information about the Results Summary Report, please see the One AI Modeling Report Introduction article.

6. Identify Insights

Identifying insights begins during the "review and refine" step and bleeds into "share insights" as well.

Which attributes were selected as features and how important each of those features is to the model is a good example of an insight you've probably already gleaned in the review step. Another is the number of predictions for each label. For example, if you're looking at attrition risk, comparing the predicted terminations to the total predictions will give you a sense of your expected turnover in the specified time frame.

You can begin to expand on your learnings by digging deeper into the EDA and Results Summary reports. There can be many interesting findings from those two reports, but here are a couple of key places to look:

  • Did you have a hypothesis before creating the model about which attributes would be predictive? Were you correct? Review the EDA report to see whether they were selected. If they were not, why? Selecting any of the attributes (variables) brings you to a detailed view containing a number of statistics. Selecting "Toggle Details" then allows you to see how the attribute compares for each of the different labels individually or combined.
  • We mentioned SHAP briefly in the "review results" step. SHAP is a mathematical method to explain the predictions of machine learning models. SHAP provides the contribution of each feature to each individual prediction. By using advanced visualization techniques on SHAP data, such as the SHAP "beeswarm" chart on the Results Summary report, a "picture" of the model can be formed. For more information about how to interpret this chart see the "What is the SHAP beeswarm chart?" on the One AI FAQs article.

In addition to what the EDA and Results Summary reports provide, additional insights are available when results are deployed back into your data model. This is described in the 7. Share Insights. They're not exclusive to sharing but are easily shared through Storyboards in One Model.

7. Share Insights

One of the most powerful things about One AI is its ability to deploy data back into your data model and share to Storyboards. This marries the predictive data to the visualization capabilities and role based security of Storyboards. It also enables joining individual predictions and driver data back to your employee data, allowing you to leverage all of your existing employee dimensions against the prediction data (in the case of employee based models).

The following suite of Storyboards has been created by the One Model team to get you moving in the right direction. They're specifically designed for looking at Attrition Risk but as you become more comfortable with the data you have the ability to create custom storyboards leveraging the data. If you're interested in these Storyboards, please discuss with your Customer Success contact at One Model for assistance in creating them.

Attrition Risk Storyboard 1: Model Details and Drivers

To understand and trust your predictions, you need to understand the model used to generate them. This storyboard serves two purposes. The first is to tell you how the model performed. If the model performed poorly, then the predictions are not to be trusted. The second is to help you understand the drivers (features) that the predictions are based on. Knowing how much influence each feature had on the predictions as well as HOW the features influenced the predictions is important.

Attrition Risk Storyboard 2: Where Does Risk Sit?

It's useful to know which features influence a predictive model, but it's also of value to know where the high risk employees in your company sit. The predictive model outputs a probability of terminating (0% to 100%) for each employee included in the data set. In this storyboard, probability of terminating is rounded to the nearest 10% and bucketed into low/medium/high categories. The employees in the prediction data are joined back to their current attributes, allowing you to leverage any of the employee related dimensions.

Attrition Risk Storyboard 3: Filter View

Knowing how drivers and predictions differ for subsets of your employee population is  valuable. Being able to compare subsets to the overall data side by side is a powerful way to visualize differences. Since both SHAP explanations and predictions made by your Attrition Risk model are joined back to your employee data, any of the dimensions you can slice your headcount by should work on this storyboard.

Attrition Risk Storyboard 4: Individual Predictions

The effect of a given feature can vary even within the same model. For example, high performance may be predictive of less attrition across most of the population but more attrition for high performers with a skillset in demand. SHAP explains the importance and directionality of each feature for each prediction made. The other attrition risk storyboards contain aggregations of the individual prediction data. Seeing explanations of the individual predictions is valuable in understanding the drivers behind these predictions.

Was this article helpful?

1 out of 1 found this helpful

Comments

0 comments

Please sign in to leave a comment.