Term |
Definition |
One AI Context |
Source/Reference |
Attribute |
A characteristic pertaining to an individual |
Used interchangeably with the term “variable”, Location, Gender, and Business Unit are examples of attributes used in a One AI predictive model. |
https://developers.google.com/machine-learning/glossary#attribute |
AutoML |
Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment. |
One AI is an AutoML platform built specifically for people data. |
|
Binary Classification |
A type of classification task that predicts one of two mutually exclusive classes: the positive class and the negative class. |
To take the attrition risk example one step further, it is a binary classification problem where “terminating” is the positive class and “not terminating” is the negative class. |
https://developers.google.com/machine-learning/glossary#binary-classification |
Class |
A component of classification models, a class is a category that the model is predicting. |
“Terminating” and “Not Terminating” are examples of classes often seen in One AI. |
https://developers.google.com/machine-learning/glossary#class |
Class Imbalance |
A classification data set with skewed class proportions. |
According to a 2021 Bureau of Labor Statistics report, voluntary turnover averages 25% in the US. In an attrition risk model, that would manifest itself as 25% for the terminating class and 75% for the non-terminating class. Those are imbalanced classes. |
https://machinelearningmastery.com/what-is-imbalanced-classification/ |
Classification |
A method in machine learning in which an algorithm is used to predict categorical outcomes. |
An example of a classification performed in One AI is “terminating vs. “not terminating”, also known as attrition risk. |
|
Cramér's V Correlation |
A measure of association between two nominal variables. |
One AI drops features that are correlated too closely with other featu res by using a Cramér's V measurement. |
|
Cross-validation |
A mechanism for estimating how well a model would generalize to new data by testing the model against one or more non-overlapping data subsets withheld from the training set. |
In One AI, cross-validation is performed by splitting the “train/test” dataset into train, validation, and holdout sets. 80% of the data goes to train and 20% to validation and holdout. The validation data has no impact on decision making. Training data gets k-folded into 10 folds. |
https://developers.google.com/machine-learning
https://developers.google.com/machine-learning /data-prep/construct/sampling-splitting/example
|
Data Leakage |
A feature that predicts the outcome too well to be plausible. |
Sometimes certain features in your data in One AI correspond too closely with the outcome you’re trying to predict. An example is a “future terminated” flag being leveraged as a feature when it is also the class being predicted. One AI detects these potentially cheating features and labels them as suspicious on the EDA report. |
https://medium.com/salesforce-einstein-platform |
Data Science |
Data science is the study of data to extract meaningful insights. It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data. This analysis helps data scientists to ask and answer questions like what happened, why it happened, what will happen, and what can be done with the results. |
Everything One AI does falls under the Data Science umbrella. |
|
Dimensionality Reduction |
Dimensionality reduction is a technique used to reduce the number of features in a dataset while retaining as much of the important information as possible. |
One AI offers both filter methods and wrapper methods for dimensionality reduction in the advanced settings. |
|
Estimator |
An algorithm that estimates a value based on other observations. |
One AI leverages sklearn, in which estimators are the different types of algorithms that can be leveraged to make predictions. |
|
F1 Score |
The harmonic mean of precision and recall. |
Precision, Recall, and F1 score for each of the classes are the most commonly leveraged model performance measures for classification models in One AI. They can be found in the Classification Report section of the Results Summary report for completed runs of Augmentations in One Model as well as on storyboards leveraging One AI output. |
|
Feature |
An attribute about each entity being predicted on that is leveraged by a machine learning model to make predictions. |
In One AI we distinguish between features and attributes/variables in that we perform a feature selection step where only the most predictive of many attributes/variables the model is presented with are selected as features. |
|
Feature Importance |
Feature Importance refers to techniques that calculate a score for all the input features for a given model — the scores simply represent the “importance” of each feature. A higher score means that the specific feature will have a larger effect on the model that is being used to predict a certain variable. |
Feature importances are found in the Feature Analysis section of the Results Summary report for completed runs of Augmentations in One Model. |
https://towardsdatascience.com /understanding-feature-importance-and- how-to-implement-it-in-pythonff0287b20285#:~:text= |
Multi-class Classification |
A classification problem in which the dataset contains more than two classes of labels. |
Predicting whether an employee will be a low, medium, or high performer next year is an example of a multi-class classification problem that could be analyzed using One AI. |
|
NULL Filling |
Replacing missing values for an attribute with a meaningful value. |
One AI has a default null drop threshold of 95%, meaning that if more than 5% of the values for an attribute are missing, it gets dropped and will not be considered as a feature. By replacing the nulls with something meaningful, you can prevent this from happening. |
https://medium.com/geekculture/how-to-deal-with- |
One Hot Encoding |
Representing categorical data as a vector in which one element is set to 1 and all other elements are set to 0. |
One AI often one hot encodes categorical variables. |
|
Overfitting |
The production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably. |
The One AI pipeline is designed to mitigate overfitting at every stage. This includes feature selection, hyperparameter optimization, final model selection, and performance reporting. |
|
P-value |
A statistical measure used to determine the likelihood that an observed outcome is the result of chance. The lower the p-value, the greater the statistical significance of the observed difference. A p-value of 0.05 or lower is generally considered statistically significant.
|
One AI uses p-value as a measure in both the smart tables and correlations integrated services. |
|
Positive Class |
The positive class is the class you are testing for. It is sometimes referred to as the “target variable”. |
For attrition risk, “Termination” is the positive class and “No Termination” is not. That’s not to say that there are more terminations than not, but rather that termination is the class we care more about predicting in this case. |
|
Precision |
The proportion of positive identifications by the model that were actually correct. |
Precision, Recall, and F1 score for each of the classes are the most commonly leveraged model performance measures for classification models in One AI. They can be found in the Classification Report section of the Results Summary report for completed runs of Augmentations in One Model as well as on storyboards leveraging One AI output. |
|
Predictive Modeling |
Predictive modeling uses statistical techniques to predict future behavior. It works by analyzing historical and current data and generating a model to help predict future outcomes. |
One AI performs predictive modeling leveraging your people data. |
https://www.gartner.com/en/information-technology |
Recall |
The proportion of actual positives that were identified correctly by the model. |
Precision, Recall, and F1 score for each of the classes are the most commonly leveraged model performance measures for classification models in One AI. They can be found in the Classification Report section of the Results Summary report for completed runs of Augmentations in One Model as well as on storyboards leveraging One AI output. |
https://developers.google.com/machine-learning |
Regression |
A method in machine learning in which an algorithm is used to predict continuous outcomes. |
While classification models are more commonly leveraged in One AI, it is also possible to create regression models. We will be creating some regression based recipes in the future. An example of a regression target in One AI is salary. |
https://www.seldon.io/machine-learning- regression-explained#:~:text=Machine% |
Scaling |
A mathematical transformation that shifts the range of a continuous value so multiple features will be on the same scale and thus won't get incorrectly weighted by the algorithm. |
One AI often performs linear scaling on continuous variables, which typically uses a combination of subtraction and division to replace the original value with a number between -1 and +1 or between 0 and 1. |
|
Supervised Machine Learning |
Machine learning is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks. Supervised machine learning is defined by its use of labeled datasets to train algorithms that predict outcomes accurately. Labeled means some input data is already tagged with the correct output. |
One AI predictive models are supervised machine learning models. |
https://en.wikipedia.org/wiki/Machine_learning
https://developers.google.com/machine-learning /glossary#supervised-machine-learning
|
Target Variable |
The outcome you’re trying to predict - a target can be categorical or continuous (a varying number) |
The target variable is the same thing as the positive class. |
https://www.datarobot.com/wiki/target/#:~:text |
Upsampling |
Upsampling is a procedure where synthetically generated data points are inserted into the dataset. After this process, the counts of both classes are almost the same. This equalization procedure prevents the model from inclining towards the majority class. |
Since the classes One AI is predicting are often imbalanced, applying upsampling is a common practice. Synthetic Minority Over-sampling Technique (SMOTE) is the most common method used in One AI for upsampling. Be aware that upsampling on extremely imbalanced classes can lead to overfitting.
|
https://www.analyticsvidhya.com/blog/2020/11 data-machine-learning-computer-vision-and-nlp/#:~:text=Upsampling%20is%20a% |
Variable |
A characteristic pertaining to an individual. |
Used interchangeably with the term “attribute”, Location, Gender, and Business Unit are examples of variables used in a One AI predictive model. |
Comments
0 comments
Please sign in to leave a comment.