One AI Machine Learning Model Troubleshooting Guide

Description: The purpose of this guide is to assist you in troubleshooting problems and errors with machine learning models in One AI. While we aim to cover all common issues users may encounter, this guide is not exhaustive. 

Module Type: Functional function sym.png

Level: Intermediate-Advanced I-Spaceship.svg

Audience: Model & storyboard creators

Prerequisites: "One AI Recipes" & "Advanced Configuration Module Series"; optional, but helpful: "Global Settings Module Series"

 

Introduction

The purpose of this guide is to assist you in troubleshooting problems and errors with machine learning models in One AI. While we aim to cover all common issues users may encounter, this guide is not exhaustive. If these fixes do not resolve your problem, please submit a detailed ticket so we can assist you further. To help us address your issue quickly and efficiently, your ticket should include the following information:

  • The display name and run id (if it has been run) of the model with a problem or error
    • Note: The run id can be found by clicking ‘Runs’ > Error label 
  • The time the model was run or the error/problem occurred 
  • The error message if one is displayed
    • Note: The error message can be found in the Recipe Data Validation section and/or by clicking ‘Runs’ > Error label in the messages tab
  • A description of what you have done, if anything, to attempt to resolve the issue
  • Applicable screenshots to help us understand what is going on
  • Permission for the CS team to grant a One AI team member site access to investigate (if applicable)

Error messages can be found either in the One AI Recipe Data Validation step or by clicking ‘Runs’ on the erroring machine learning model and then clicking the “Errored” label. 

We highly recommend running Data Validation when creating Recipes or running models since errors can be addressed before waiting for the machine learning pipeline to run. Validation can be found in the Recipe in the “Would you like to verify that all of the selections you have made are valid?” step (wording may differ based on Recipe type). Click ‘Generate Data Statistics’. If you receive an “Action Needed” message, the error will be detailed there. 

Errors can also be identified by clicking ‘Runs’ on the erroring machine learning model and then clicking the “Errored” label. 

Dataset ID Duplication

Possible Error Messages: 

  • JOB_ERROR (500): The dataset ID column is not entirely unique. Please ensure that there are no duplicate values. 
  • JOB_ERROR (500): The dataset ID column is not entirely unique. Dataset_id is not unique in train Underlying Error:: dataset_id is not unique in train 

Scenario 1: Users will receive this error if the unique identifier selected in the “What Headcount do you wish to make predictions for?” field is not completely unique. For example, this can occur if you used 'email address' as your unique identifier but several employees in the model dataset share the same email address.

Solution 1: Ensure you use an id that is unique for each instance included in your model dataset. Typically, person_id from the employee or employee event table is the best choice. In rare situations, duplication is unavoidable. In these cases, we can create a composite dataset id where it would combine id + date or time with YAML in the Advanced Configuration section. 

Scenario 2: Users will also receive this error message if they include core attribute(s) that have multiple values at a single point in time for at least one of the employees (or instances) in the model dataset. For example, this can occur if you include a 'critical skills' column in which employees can have multiple values.

Solution 2: The first step to resolving this issue is to identify which column(s) are causing the duplication. To do this, download the Train/Test Data from the “Do you want to download the dataset generated by this recipe?” step of the recipe and open the file. Then, scroll to the “How much history do you want to use to train your predictive model?” step and check how many intervals of data the model is using for training.

  • If you used 1 training interval, press Ctrl+F to search your unique identifier column. Sort the column from A to Z and highlight duplicates. Then, go through each ID that has a duplication and scroll through each column. You are looking for columns with different values that are causing the instance to split into two rows. For example, in the image below, person_id 11 has the same values in all columns except critical skills, which has two different values. This discrepancy causes duplication. 

To correct this, navigate to the “Which core attributes do you want to use in your prediction?” step and exclude the critical skills column. There may be multiple columns causing duplication, and they may not be the same for each employee. Generate data statistics in the “Would you like to verify that all of the selections you have made are valid?” step. Once you receive a Success vs. Action Needed result, you have likely identified all the duplications. You can work with your Data Engineer to clean up any unintended duplications.

  • If you used multiple training intervals, each unique identifier should repeat according to the number of training intervals. You will follow the same process as if you had used only one training interval, but instead, you will be looking for rows that have duplications beyond the expected number of training intervals. For example, if you used two training intervals, there should be two rows for each employee (or instance). Therefore, you are looking for employees with three or more rows to identify which column(s) are causing the duplication.

 

Core/Generative Attributes & Data Statistics Don’t Load; Site Times Out

Possible Error Message: This error is identified if you generate data statistics in the One AI Recipe. However, it can cause your site to time out, so you may not be able to generate data statistics. 

Scenario: Sometimes, One AI Recipes will fail to display core attributes, generative attributes, and/or data statistics and will display an error at the top of the screen and/or time out several minutes after completing the first few steps of the Recipe. This happens because One AI performs a number of data integrity checks before rendering core attributes because One AI machine learning requires valid joins and a unique Dataset ID per row in order to work. When there are major data integrity issues in the data model, the check fails to complete due to the load the query places on the database. The root cause is bad joins or extreme duplication. 

Solution: In order to find which joins are invalid or causing duplication, navigate to the data warehouse relationship site validation errors (Admin > Site Validation > Data Warehouse Relationships). You may need to connect with your One Model admin if you do not have access to view this page.

  • Check for a join from 'one.employee' or 'one.prd_employee' to 'one.usage_statistics_event'. If this join is present, submit a ticket to request your data engineer to remove it as this join no longer serves a functional purpose and causes this error.
  • Check for an error including "references a column that does not exist" on any table that is in the population metric table, or any table joined to the population metric table (most commonly 'one.employee', 'one.prd_employee', one.employee_event’, or one.evt_employee’). If any exist, submit a ticket to have the Data Engineer correct it. 
  • Check for an error including "causes duplication of records…" on any table that is in the population metric, or any table joined to the population metric table (most commonly 'one.employee', 'one.prd_employee', one.employee_event’, or one.evt_employee’). If any exist, submit a ticket to have the Data Engineer correct it.

 

Results Summary Displays '0's for F1, Precision, & Recall for One Class

Scenario: The classification model successfully ran, but the F1, precision, and recall holdout scores are all 0 for the minority class in the Classification Report section of the Results Summary. This typically occurs when the model couldn't learn how to predict the minority class due to an insufficient number of examples in the training data. 

For example, for the model that resulted in the Classification Report below, there were 13,000 instances of ‘No termination’ and only 6 instances of ‘Termination’ in the training data. In cases this extreme, upsampling can’t adequately address this issue because the minority class is not diverse enough to effectively create synthetic instances without causing overfitting. Simply duplicating these instances also doesn't improve the underlying issue of insufficient data.

 

Solution: Ensure the training data includes a sufficient number of examples for each class. You can do this by either:

  • Changing the target metric to one with a more balanced split between classes.
  • Changing the model population to include more examples of both classes for the model to learn from.

 

“Unhandled Error” 

Possible Error Messages: An unhandled error occurred during the augmentation run. 

Scenario: You’ve run a model and are getting a vague, unhandled error. 

Solution 1: For context, when building machine learning models using  a One AI Recipe, users can access every metric, dimension, and column within your One Model site, regardless of their data access role. One AI Recipes do not follow the role-based security that is applied in the rest of One Model. If the model creator selects a target metric/column or a model population metric they do not have data access for, they can still create the model, but running it will result in failure. Additionally, if the model creator creates or selects a generative attribute that was built with a metric that they do not have access to, again, they can still create the model, but running it will result in failure. This is why model creators should have full data access. To resolve this:

  • Copy the model so the new model creator, you, has access to all of the necessary metrics, or
  • Grant the original model creator access to the necessary metrics.

Save and rerun the model.

Note: If a user with access to the unpermitted metric tries to run a model created by someone without access, it will still fail because the model always runs under the permissions of its original creator. In order for the user with access to successfully run the model, they must recreate it themselves. Check out the module on One AI & Security

 for more information. 

Solution 2: Check if the model creator has the application access role ‘CanAccessOneAIMenu’. The original model creator’s user does not need to be active, but most have this role assigned to them. If they do not, either grant them this role or copy the model and rerun as the new model creator. 

Solution 3: If solution 1 does not solve the problem, please submit a detailed ticket. 

 

Metric References a Dimension Node that No Longer Exists 

Possible Error Messages: Error: An error occurred during the augmentation run. The following validation issues were detected. The 'Terminations' metric references a node from the 'Event Reason' dimension that does not exist: Vol Terms.

Scenario: When using a dimension to filter a metric in One AI models (for the target metric, the model population metric, and/or metrics used to build generative attributes), you may encounter a model failure if the dimension is re-leveled, edited, or changed in a way that causes the metric to break. This issue arises because the modifications to the dimension disrupt the relationship with the metric, causing the model to fail.

The error message will indicate which metric is causing the model failure and specify the node(s) from the dimension that no longer exist. For instance, in the example above, the 'Terminations' metric is causing the failure because the 'Vol Terms' node no longer exists in the 'Event Reason' dimension.

Solution 1: Go into the metric editor from Explore and fix the broken metric. This typically involves refiltering the metric with the new dimension or dimension leveling. The error in the image below should appear when you edit the metric. After correcting the metric, save it and rerun the model. This is the preferred solution because it will fix the metric everywhere it is used within your One Model instance. 

Solution 2: If updating the metric is not feasible, replace it with a working metric. Then save and rerun the model.

 

One AI is Dropping Columns as 100% Null, Despite Data Presence

Scenario: When examining a model’s Exploratory Data Analysis (EDA) report, you notice a column was dropped with a ‘Missing’ label indicating it is 100% null or a ‘Constant’ label indicating the only value is 0 or NaN. However, when you check this column in the model’s data statistics, the train/test data download, or Explore, there is data in the column for the model population.

Solution: Check who the original model creator was. The model runs based on the permissions of the model creator. If the model creator does not have access to these columns (or has partial access due to a contextual security role that the model population falls outside of), they will be treated as 100% null and automatically dropped. Even if the unpermitted column is predictive, the model won't use it. To resolve this:

  • Copy the model so the new model creator, you, has access to all input columns, or
  • Grant the original model creator access to these columns.

Save and rerun the model.

Check out the module on One AI & Security for more information. 

 

Special Characters in Prediction Metrics

Possible Error Message: JOB_ERROR (500): An error occurred while validating the input data. Underlying Error:: ‘Terminations : Involuntary’

  • The metric after the ‘Underlying Error’ contains the name of the metric with the special character causing the error. 

Scenario: When creating or editing a machine learning model, the first step is to select the metric that defines what the model will predict, such as high performance, voluntary terminations, or new hire failure. If the selected metric name includes a restricted special character (listed below), the model will fail. 

  • colon ‘:’ 
  • comma ‘,’ 
  • curly brackets/braces ‘{‘ and ‘}

The model population and generative attributes name can contain all special characters. 

Solution: Create a duplicate of the metric you want to use, but rename it to exclude any restricted special characters. Tip: Place the metric in a One AI category.

If you encounter a special character not listed above that causes an error, please inform us. While we strive to accommodate such characters, some may remain restricted due to the backend structure of One AI. Certain characters have specific syntactic roles, so when they  show up in other places, it tries to apply those roles, causing the model to fail.

 

Not Enough Features Made it Through Cleaning

Possible Error Message: JOB_ERROR (500): By default, One AI attempts to use at least 5 features. Only 3 made it through cleaning. You can edit your ML model and click Dimensionality Reduction --> Override --> No Selection --> Override to remove the minimum number of features. Underlying Error::

Scenario: One AI automatically performs dimensionality reduction, reducing the number of input variables while retaining relevant information. By default, a model will have at least five features selected. Additionally, One AI handles data preprocessing, cleaning, and transforming input variables for machine learning suitability. It automatically drops variables that violate global settings, such as having too many null values, constant values, or causing data leakage. If you received this error, it means fewer than 5 variables passed the cleaning process.

Solution 1: Click 'Edit' for the errored model and scroll down to "Dimensionality Reduction." Turn on the "Dimensionality Reduction Override" toggle, then expand your options. Next, turn on the "No Selection Override" toggle to disable dimensionality reduction, removing the minimum five-variable requirement. Ensure no configurations are set in the Filter or Wrapper sections. Save and rerun the model.

Solution 2: In the One AI Recipe, include more input features that comply with global settings in the core attributes step of the Recipe. Save and rerun the model.

Solution 3: Reconfigure the global settings to allow more input features. Click 'Edit' for the errored model, scroll to "Global Settings," make your updates, save, and rerun the model.

 

Per Column Interventions Conflicting with Dimensionality Reduction

Possible Error Message: JOB_ERROR (500): Configuration concurrently specifies PCI force selected features and RFE (recursive feature elimination), which cannot guarantee a given feature's selection. Underlying Error::

Scenario: If you force-select a particular feature through a per-column intervention, the model cannot perform recursive feature elimination (RFE) for dimensionality reduction. RFE iteratively removes the least important features to improve model performance, and forcing the selection of specific features interferes with this process.

Solution 1: Click 'Edit' for the errored model and scroll down to "Dimensionality Reduction." Turn on the "Dimensionality Reduction Override" toggle, then expand your options. Next, turn on the "No Selection Override" toggle to disable dimensionality reduction, removing the minimum five-variable requirement. Ensure no configurations are set in the Filter or Wrapper sections. Save and rerun the model.

Solution 2: Delete all per column interventions involving force selecting input variables. Save and rerun the model. 

 

Target Metric/Column Contains Only a Constant Value

Possible Error Message: JOB_ERROR (500): After applying run specific filters, the target column has been filtered such that it contains only a constant value and is therefore not an applicable column. Please check to make sure your run specific filters leave at least two different values in the column. Underlying Error:: After applying run filters the target column is now a constant value

Scenario:  The target metric is what the One AI model predicts. In classification tasks, the training data must be labeled with the outcomes in the target column for the model to learn. This metric column needs at least two distinct values, such as "terminated" and "did not terminate," because the model needs to learn how to differentiate between different categories. With only one value, the model would have no basis for comparison and would be unable to categorize or make meaningful predictions. For regressions, the target metric must have a range of values to learn the relationships between the input features and the target variable. If the target metric column only has one value, the model cannot learn any meaningful patterns or make accurate predictions.

Solution 1: Check how your target metric is filtered. Ensure that the filter does not prevent the training data from having at least 2 different values. Either remove the filters or change the target metric to ensure that 2 different values are present. 

Solution 2: Change your model population to represent at least two outcomes for the target metric. You can also increase your training intervals to provide examples of both outcomes in the training data.

 

Target Metric/Column is Null

Possible Error Message: JOB_ERROR (500): The classifier target column High Performers is all nulls Please check your estimator target column and make sure it is populated with values Underlying Error:: The classifier target column High Performers is all nulls

Scenario: The target metric is what the One AI model predicts. In classification tasks, the training data must be labeled with outcomes in the target column for the model to learn. For regression tasks, the target column should contain continuous values. This metric column must have data for the model to learn to categorize outcomes or predict continuous values.

Solution 1: Ensure you are using the correct metric and that it’s populated for the selected model population. If not, select the correct, populated metric. Save and rerun the model. 

Solution 2: If the target metric is generally populated, but is not populated for the selected model population, change the model population or correct in the system of record. Save and rerun the model.

 

Minority Class Only has Less than 5 instances in Training Dataset

Possible Error Message: JOB_ERROR (500): An error occurred while validating the input data for an estimation task. Underlying Error:: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 5.

This error is also identified if you generate data statistics in the One AI Recipe:

 

Scenario: For classifications, the training data must be labeled with the outcomes in the target column for the model to learn. Each outcome must have at least 5 examples in the training dataset because the model needs multiple examples of each outcome to identify patterns and differences between them. With fewer than 5 examples, the model cannot generalize from the data, making it unable to accurately learn and classify future instances. With less than 5 examples, the model will overfit to the single example, meaning it will memorize it rather than learning general characteristics. That is why most machine learning algorithms require a minimum number of examples to function correctly. More than 5 examples lead to better and more accurate predictions.

Solution 1: Expand the model population to include more instances from the class that currently has less than 5 members. Save and rerun the model. 

Solution 2: Check the override labels to see if more outcomes can be grouped together to bring that class representation to at least 5. Adjust, save, and rerun the model. 

Solution 3: Check the filters on the target metric/column and remove filters that are not necessary and/or are preventing the minority class from having at least members. Adjust, save, and rerun the model.

 

Preprocessing Parameter Combinations Rejected as Invalid

Possible Error Message: JOB_ERROR (500): All preprocessing parameter combinations were rejected as invalid. Please check your configuration.

Scenario: This error typically results from global settings and/or advanced configuration issues (Bias Detection and Removal, Dimensionality Reduction, Estimator Configuration, Upsampling, Per Column Interventions, Probability Calibration, and/or custom YAML). Depending on the characteristics of the model dataset, these settings need to be configured correctly to avoid conflicts.

  • Global Settings: These define the overall parameters for the model. If they are too restrictive or conflicting with the dataset characteristics, the preprocessing parameters might be deemed invalid.
  • Bias Detection and Removal: Ensures the model is fair and unbiased. Incorrect settings might reject valid preprocessing parameters.
  • Dimensionality Reduction: Reduces the number of input variables. If not configured correctly, it can lead to invalid parameter combinations, especially if the remaining variables don't meet other settings' requirements.
  • Estimator Configuration: Defines the model's parameters. Conflicting configurations can result in invalid preprocessing parameters.
  • Upsampling: Balances class distribution by duplicating minority class instances. If upsampling conflicts with other settings, it can cause this error.
  • Per Column Interventions: Force-selecting or excluding specific columns can lead to invalid combinations if those columns are crucial for other settings.
  • Probability Calibration: Adjusts the predicted probabilities. Misconfiguration can cause conflicts with other preprocessing parameters.
  • Custom YAML: Allows for more granular control of model settings. Errors in this configuration can lead to invalid preprocessing parameters.    

Solution 1: First, check if you have turned on any overrides and then decided not to configure them but forgot to turn the override toggle back to 'Off'. This can cause the model to error. If this is the case, turn the override toggles off, save, and rerun the model.

Solution 2: Sometimes the error message will include a portion about underlying errors. For example:

Underlying Error:: All preprocessing combinations for this run were rejected as invalid. Here is a list of rejected combinations and their reason for being rejected: This preprocessing combination was rejected because: Rejecting filter config for dataset having fewer number of features than specified to be selected by the filter here is the associated config: filter {'method' : 'mutual_info', 'num_features': #} wrapper: {method': 'rfe', min_features': #

Read the underlying error to find where the advanced configuration was configured incorrectly or will not work with the current model dataset. For example, the above error informs us that the issue lies in our dimensionality reduction configuration. There are not enough valid features making it through the cleaning process without being automatically dropped by One AI during preprocessing. You can either turn off dimensionality reduction, reconfigure the global settings to be less restrictive, create droppablity per column interventions for specific columns so that One AI ignores the global settings, or update the model dataset to include more features that meet your global settings. This error usually indicates that there are too few variables included to meet the Dimensionality Reduction wrapper minimum features configured (defaulted to 5).

Another example:

Underlying Error:: All preprocessing combinations for this run were rejected as invalid. Here is a list of rejected combinations and their reason for being rejected: This preprocessing combination was rejected because: upsampling: {'method': None, 'ratio': 'ratio_when_not_upsampling_is_irrelevant

The above error is usually in conjunction with the underlying error about dimensionality reduction. When dimensionality reduction is rejected, most other advanced configurations will be rejected. This is because dimensionality reduction is a foundational step that influences the availability and quality of features for subsequent preprocessing steps. If dimensionality reduction fails, the remaining features may not meet the requirements for other advanced configurations, such as upsampling, which depend on a sufficient number of relevant features. This dependency causes the rejection of most other advanced configurations when dimensionality reduction is not successful.

Solution 3: If you made any manual configurations:

  • Review Configurations: Check each advanced configuration and global setting for conflicts or overly restrictive parameters. Ensure they align with the dataset's characteristics.
  • Adjust Parameters: Modify settings that might be causing conflicts. For example, ensure that dimensionality reduction doesn't remove crucial variables needed for other configurations.
  • Validate YAML: If using custom YAML, verify that it is correct and there are no conflicting settings.
  • Test Iteratively: Make incremental changes and test the model to identify which setting is causing the conflict.

 

A Positive Label was Specified on a Multiclass Run

Possible Error Message: JOB_ERROR (500): A positive label was specified on a multiclass run. Please remove the positive_label spec or ensure the target is binary."

Scenario: This error occurs when a positive label is specified in a model configuration that is set up for multiclass classification. The "positive_label" parameter is used to identify which label is considered the positive class in binary classification problems. However, in multiclass classification, there are more than two classes, making the concept of a single positive label invalid.

Solution: Do not set a positive label in the One AI Recipe in the “Would you like to override the values generated by your prediction selection?”. 

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.