One AI Glossary of Terms

Term

Definition

One AI Context

Source/Reference

Attribute

A characteristic pertaining to an individual

Used interchangeably with the term “variable”, Location, Gender, and Business Unit are examples of attributes used in a One AI predictive model.

https://developers.google.com/machine-learning/glossary#attribute 

AutoML

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment.

One AI is an AutoML platform built specifically for people data.

https://en.wikipedia.org/wiki/Automated_machine_learning

Binary Classification

A type of classification task that predicts one of two mutually exclusive classes: the positive class and the negative class.

To take the attrition risk example one step further, it is a binary classification problem where “terminating” is the positive class and “not terminating” is the negative class.

https://developers.google.com/machine-learning/glossary#binary-classification

Class

A component of classification models, a class is a category that the model is predicting.

“Terminating” and “Not Terminating” are examples of classes often seen in One AI.

https://developers.google.com/machine-learning/glossary#class

Class Imbalance

A classification data set with skewed class proportions.

According to a 2021 Bureau of Labor Statistics report, voluntary turnover averages 25% in the US. In an attrition risk model, that would manifest itself as 25% for the terminating class and 75% for the non-terminating class. Those are imbalanced classes.

https://machinelearningmastery.com/what-is-imbalanced-classification/ 

Classification

A method in machine learning in which an algorithm is used to predict categorical outcomes.

An example of a classification performed in One AI is “terminating vs. “not terminating”, also known as attrition risk.

https://developers.google.com/machine-learning/

glossary#classification-model

Cramér's V Correlation

A measure of association between two nominal variables.

One AI drops features that are correlated too closely with other featu

res by using a Cramér's V measurement.

https://en.wikipedia.org/wiki/Cram%C3%A9r%27

s_V

Cross-validation

A mechanism for estimating how well a model would generalize to new data by testing the model against one or more non-overlapping data subsets withheld from the training set.

In One AI, cross-validation is performed by splitting the “train/test” dataset into train, validation, and holdout sets. 80% of the data goes to train and 20% to validation and holdout. The validation data has no impact on decision making. Training data gets k-folded into 10 folds.

https://developers.google.com/machine-learning

/glossary#cross-validation

 

https://developers.google.com/machine-learning

/data-prep/construct/sampling-splitting/example

 

https://scikit-learn.org/stable/modules

/cross_validation.html 

Data Leakage

A feature that predicts the outcome too well to be plausible.

Sometimes certain features in your data in One AI correspond too closely with the outcome you’re trying to predict. An example is a “future terminated” flag being leveraged as a feature when it is also the class being predicted. One AI detects these potentially cheating features and labels them as suspicious on the EDA report.

https://medium.com/salesforce-einstein-platform

/einstein-prediction-builder-a-model-thats-too-

good-to-be-true-f1754e5ca48e

Data Science

Data science is the study of data to extract meaningful insights. It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data. This analysis helps data scientists to ask and answer questions like what happened, why it happened, what will happen, and what can be done with the results.

Everything One AI does falls under the Data Science umbrella.

https://aws.amazon.com/what-is/data-science/

Dimensionality Reduction

Dimensionality reduction is a technique used to reduce the number of features in a dataset while retaining as much of the important information as possible.

One AI offers both filter methods and wrapper methods for dimensionality reduction in the advanced settings.

https://www.geeksforgeeks.org

/dimensionality-reduction/ 

Estimator

An algorithm that estimates a value based on other observations.

One AI leverages sklearn, in which estimators are the different types of algorithms that can be leveraged to make predictions.

https://scikit-learn.org/stable/tutorial

/machine_learning_map/index.html

F1 Score

The harmonic mean of precision and recall.

Precision, Recall, and F1 score for each of the classes are the most commonly leveraged model performance measures for classification models in One AI. They can be found in the Classification Report section of the Results Summary report for completed runs of Augmentations in One Model as well as on storyboards leveraging One AI output.

https://en.wikipedia.org/wiki/F-score 

Feature

An attribute about each entity being predicted on that is leveraged by a machine learning model to make predictions.

In One AI we distinguish between features and attributes/variables in that we perform a feature selection step where only the most predictive of many attributes/variables the model is presented with are selected as features.

https://developers.google.com

/machine-learning/glossary#feature

Feature Importance

Feature Importance refers to techniques that calculate a score for all the input features for a given model — the scores simply represent the “importance” of each feature. A higher score means that the specific feature will have a larger effect on the model that is being used to predict a certain variable.

Feature importances are found in the Feature Analysis section of the Results Summary report for completed runs of Augmentations in One Model.

https://towardsdatascience.com

/understanding-feature-importance-and-

how-to-implement-it-in-pythonff0287b20285#:~:text=

Feature%20Importance%20refers%20to%20techniques,

to%20predict%20a%20certain%20variable.

Multi-class Classification

A classification problem in which the dataset contains more than two classes of labels.

Predicting whether an employee will be a low, medium, or high performer next year is an example of a multi-class classification problem that could be analyzed using One AI.

https://developers.google.com/machine-learning

/glossary#multi-class-classification

NULL Filling

Replacing missing values for an attribute with a meaningful value.

One AI has a default null drop threshold of 95%, meaning that if more than 5% of the values for an attribute are missing, it gets dropped and will not be considered as a feature. By replacing the nulls with something meaningful, you can prevent this from happening.

https://medium.com/geekculture/how-to-deal-with-

missing-values-in-machine-learning

-98e47f025b9c 

One Hot Encoding

Representing categorical data as a vector in which one element is set to 1 and all other elements are set to 0.

One AI often one hot encodes categorical variables.

https://developers.google.com/machine-learning

/glossary#one-hot-encoding

Overfitting

The production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably.

The One AI pipeline is designed to mitigate overfitting at every stage. This includes feature selection, hyperparameter optimization, final model selection, and performance reporting.

https://en.wikipedia.org/wiki/Overfitting

P-value

A statistical measure used to determine the likelihood that an observed outcome is the result of chance. The lower the p-value, the greater the statistical significance of the observed difference.

A p-value of 0.05 or lower is generally considered statistically significant.

 

One AI uses p-value as a measure in both the smart tables and correlations integrated services.

https://www.simplypsychology.org/p-value.html

Positive Class

The positive class is the class you are testing for. It is sometimes referred to as the “target variable”.

For attrition risk, “Termination” is the positive class and “No Termination” is not. That’s not to say that there are more terminations than not, but rather that termination is the class we care more about predicting in this case.

https://developers.google.com/machine-learning

/glossary#positive-class

Precision

The proportion of positive identifications by the model that were actually correct.

Precision, Recall, and F1 score for each of the classes are the most commonly leveraged model performance measures for classification models in One AI. They can be found in the Classification Report section of the Results Summary report for completed runs of Augmentations in One Model as well as on storyboards leveraging One AI output.

https://developers.google.com/machine-learning

/crash-course/classification/precision-and-recall

Predictive Modeling

Predictive modeling uses statistical techniques to predict future behavior. It works by analyzing historical and current data and generating a model to help predict future outcomes.

One AI performs predictive modeling leveraging your people data.

https://www.gartner.com/en/information-technology

/glossary/predictive-modeling#:~:text=Predictive%

20modeling%20is%

20a%20commonly,to%20help%

20predict%20future%20outcomes.

Recall

The proportion of actual positives that were identified correctly by the model.

Precision, Recall, and F1 score for each of the classes are the most commonly leveraged model performance measures for classification models in One AI. They can be found in the Classification Report section of the Results Summary report for completed runs of Augmentations in One Model as well as on storyboards leveraging One AI output.

https://developers.google.com/machine-learning

/crash-course/classification/precision-

and-recall

Regression

A method in machine learning in which an algorithm is used to predict continuous outcomes.

While classification models are more commonly leveraged in One AI, it is also possible to create regression models. We will be creating some regression based recipes in the future. An example of a regression target in One AI is salary.

https://www.seldon.io/machine-learning-

regression-explained#:~:text=Machine%

20Learning%20Regression%20is%20a,

used%20to%20predict%

20continuous%20outcomes.

Scaling

A mathematical transformation that shifts the range of a continuous value so multiple features will be on the same scale and thus won't get incorrectly weighted by the algorithm.

One AI often performs linear scaling on continuous variables, which typically uses a combination of subtraction and division to replace the original value with a number between -1 and +1 or between 0 and 1.

https://developers.google.com/machine-learning

/glossary#scaling

Supervised Machine Learning

Machine learning is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks. Supervised machine learning is defined by its use of labeled datasets to train algorithms that predict outcomes accurately. Labeled means some input data is already tagged with the correct output.

One AI predictive models are supervised machine learning models.

https://en.wikipedia.org/wiki/Machine_learning

 

https://developers.google.com/machine-learning

/glossary#supervised-machine-learning

 

Target Variable

The outcome you’re trying to predict - a target can be categorical or continuous (a varying number)

The target variable is the same thing as the positive class.

https://www.datarobot.com/wiki/target/#:~:text

=What%20is%20a%20Target%

20Variable,your%20dataset%20and%20the%

20target.

Upsampling

Upsampling is a procedure where synthetically generated data points are inserted into the dataset. After this process, the counts of both classes are almost the same. This equalization procedure prevents the model from inclining towards the majority class.

Since the classes One AI is predicting are often imbalanced, applying upsampling is a common practice. Synthetic Minority Over-sampling Technique (SMOTE) is the most common method used in One AI for upsampling.

Be aware that upsampling on extremely imbalanced classes can lead to overfitting.

 

https://www.analyticsvidhya.com/blog/2020/11

/handling-imbalanced-

data-machine-learning-computer-vision-and-nlp/#:~:text=Upsampling%20is%20a%

20procedure%20where,inclining

%20towards%20the%20majority%20class.

Variable

A characteristic pertaining to an individual.

Used interchangeably with the term “attribute”, Location, Gender, and Business Unit are examples of variables used in a One AI predictive model.

https://developers.google.com

/machine-learning/glossary#attribute

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.