There are a lot of companies in HR that offer Machine Learning and Predictive Analytics products or services. In many cases, these offerings are based on a proprietary model developed by that particular vendor. Their "Secret Sauce" so to speak.
One Model is different. Our machine learning offering tells you all about the model it created so that you can understand it and make a decision about whether you trust its results. The main vehicle for delivering this transparency is the Modeling Report. The Modeling Report provides a large amount of detail about the model itself and how it performed. In this summary, I'll take you through some of the key things that I look at when looking over the results from an Augmentation.
If you are looking for more technical detail behind the values shown on the report, the documentation for sklearn would be a better place to start: https://scikit-learn.org/stable/
At the top of the Modeling Report you'll see the type of model that was selected. This was a turnover model based on some sample records I created in Excel. For more detail on the data-- you can check out the EDA report here: http://help.onemodel.co/en/articles/3148407-eda-report-introduction
First off, we can see that One AI selected a Random Forest model, with 10 features. This is just one of many different modeling scenarios that were tested. This was a classification problem and One AI would have tried several variations of Random Forrest, along with Decision Trees, Logistic Regressions, possibly some CatBoost Models, etc.
Next up you'll see the features that were selected, ranked by their importance to the model.
At the top feature is manager performance score. Again this was a turnover model based on some data I made up in Excel, but I went out of my way to create some extra records where employees terminated who had lower performing managers. Sure enough, One AI picked up on my attempt to cook the books and identified this as a very strong signal of looming attrition. And this just so happens to be a pretty common thread among companies we work with as well.
Yes... even after throwing a ton of machine learning at the data, it still turns out that people leave managers, no companies....
Finally, at the bottom of the report is this Confusion Matrix. I'm not making this up. A confusion matrix is used with classification models (like turnover) to show how well the model performed. Despite it's name, it makes good sense once you get used to it.
Let's start with recall. Recall is how data scientists say, "When I tested the model, did it actually find the people who terminated. In this case we look in the Terminated - Yes row and the recall column to see that our answer is .355. That means that this model found 35% of the people who terminated.
But this is only part of the story. Think of it like this. Our model is going down the data set, marking some people as Terminated and some as not. Recall tells us that it "found" 35% of the people who actually terminated. But it might have incorrectly labeled some people as Terminated who did not, in fact, terminate.
And so we also look at precision. Precision answers the question, "If the model marks someone as going to terminate, how often does that personal actually terminate?" In this case it was .441 or 44% of the time.
Taken together, looking at this modeling report, you could say, "Ok we've got ourselves a model that successfully finds about a third of the people who are going to leave, and when that model says someone is going to leave, there's a 40% ish chance that they actually do."
These numbers are ok but not great. Mostly because I made the data up. The point is not how good this artificial model is, the point is that you will have this type of transparency for any model created by One AI.
Learning to use the Modeling Report will help you when you sit down with your VP to show off the latest predictive model you've made and she asks, "Yeah but how good is it and can we trust it?"