Describes the files that are required for a One AI machine learning classification or regression augmentation.
In order to configure a new machine learning augmentation in One AI you must create a data destination with two files.
The three required columns for One AI machine learning models
Dataset Id: This is a unique id per row in the file. For example, you are creating an attrition risk model, this would likely be the employee id.
Sample Date: This field denotes the "as of" date for the records in the file. For example, you might provide data from the beginning of 2017, the beginning of 2018, and then use that data to predict attrition risk for 2019.
Classifier Target: This is the column with the value you are trying to predict. For example, it might have a 0 or a 1 to indicate whether the employee terminated. For a regression problem it might have a number in it, like headcount or days to fill. The format does not have to be numeric for classification problems. Instead of a 0 or 1, you might have a value that says "Terminated" or "Didn't Terminate".
So that said, what does One AI do with the three files-- and why does it need three of them? Here's how it works.
The two files required for One AI machine learning models
Typically you will provide two files with historical data and one with the current data. So lets say that file A is data from two years ago, file B is data from one year ago, and file C is the current data. One AI considers the files in alphabetical order, so name accordingly with the oldest data having a file name that comes first in the alphabet and the newest data having a file name that comes last.
File 1 (Train/test data)
One AI splits this data into a train set and a test set and trains a bunch of models on it (logistic regression, decision tree, random forest, etc.). One AI then selects the top performing model based on the data from this file. Detailed information about the selected model is shown on the Modeling Tab when you click on a specific run in the list of runs for that augmentation.
File 2 (Prediction data)
One AI then takes the model it selected and creates predictions using the data from this file. This data is usually data from the current point in time. It doesn't have actual results in them. In other words, you don't yet know who will terminate, how many days a req will take to fill, etc.
These predictions are available on the Results Summary tab when you click on a specific run in the list of runs for that augmentation. These predictions are also fed back into One Model as a data source and can be used in metrics, dimensions and dashboards.
Comments
0 comments
Please sign in to leave a comment.