Welcome to the 2020.07.08 product release. This article provides an overview of the product innovations and improvements delivered on 8 July 2020.
One AI Innovations - Removing Bias from Training Data
Over the last several sprints, the One AI team has been working to integrate a number of group fairness metrics. Ultimately our objective is to make measures of group fairness accessible throughout One Model so decision-makers always have the context of fairness accessible to them during the decision-making process.
We've started by integrating fairness metrics and algorithms to remove bias from training data into One AI. The first measure we've implemented is disparate impact. Disparate impact refers to practices in employment that adversely affect one group of people of a protected characteristic more than another even if a group doesn't overtly discriminate (i.e. their policies may be neutral). In United States labor law, Disparate impact can be used along with other measures to establish a "disproportionately adverse effect on members of a protected class as compared with non-members of the protected class." [https://en.wikipedia.org/wiki/Disparate_impact]. It is worth noting that disparate impact is different from disparate treatment. Disparate treatment concerns purposeful discriminatory action. As a formula disparate impact can be calculated as:
Pr(positive outcome | unprivileged) / Pr(positive outcome | privileged)
The intuition of this formula is the probability of the unprivileged group having a positive outcome divided by the probability of the privileged group having a positive outcome -- this gives you a ratio of disparate impact where 1.0 is perfect equality and 0.0 is perfect inequality. Disparate impact is a simple measure of group fairness and doesn't take into account sample sizes and instead focus purely on outcomes. These limitations work well with attempting to prevent bias from getting into machine learning.
If bias exists in the training data that will be used to teach a Machine Learning Algorithm, it can be carried through into the prediction. It doesn't matter if the bias in the training data is causal or representative -- if there is bias, then it is likely that bias will carry into predictions. Because of this possibility, it is ethically imperative to measure, report, and prevent bias from making its way into Machine Learning. We've integrated disparate impact reporting into all of our One AI performance reports so users can easily consider disparate impact while building Machine Learning Pipelines. This is the first of about 10 measures of group fairness we plan to add over the following months.
Once disparate impact is discovered in training data, we wanted to provide our users with options to remove bias from their training data. IBM has done some fantastic work implementing tooling built on top of the research on group fairness and removing disparate impact [https://aif360.readthedocs.io/en/latest/index.html , https://arxiv.org/abs/1412.3756]. We've implemented a set of these tools seamlessly into One AI pipelines that allow users to algorithmically remove biased training data. If a user decides to remove bias from their training data we will edit feature values to increase group fairness while trying to make as little change as possible.
If you’re interested in removing bias from your One AI pipelines please reach out to our team, and we will help you set it up.
In this release we added Amazon Redshift as a Data Destination. This can be used to transform and re-load data into the One Model Redshift cluster, so that it can be used for Direct Connect. To configure the Destination for this purpose, set Redshift Type to Internal. This needs to use a different Schema to any existing Data Source.
The Redshift Data Destination can also be used to send data to another Redshift cluster that can be accessed over the internet. This can be configured using the Redshift Type of External, and with the additional configuration to connect to the Redshift cluster (ref 3176)
Improvements to Data Ingestion
Workday Event History Selector
Added a feature where the Workday Event History and Event Details endpoint can be enabled separately to the Worker endpoint for the Workday connector. This allows the endpoint to be disabled if the data is not necessary, which can substantially reduce processing time. This can be configured in the Workday Data Source (ref 4640)
Greenhouse Data Connector - Demographics
Added the Demographic endpoints to the Greenhouse Data Connector. These can be found in the Configurable Tasks section of the Greenhouse Data Source (ref 5052)
Improvements to Data Pipeline Processing
Added additional information to the Data Load screen for errors that occur while running Processing Scripts. These include the step that errored, as well as the SQL error information from Redshift. (ref 5084)
Minor Improvements and Bugs Fixed
One Ai persisted model performance and stability - We fixed a bug that caused persisted models from the previous version of One AI to periodically fail. We added additional logic to prevent this from happening again.
We changed the message that appears in the new UI in tables for charts when a query is too large to display to simply say just that, so the text now reads:“Query too large to display - refine or export query to see all results.” (ref 5139)
There were a number of improvements to the Branding in the new UI, most notably properly applying the colour settings for the Navigation Bar Drop Down Menu.
We improved the text alignment and wrapping for Storyboard Names in the Storyboard Library and the display of metric names and descriptions in the drill-to-detail pop-up window per the screenshot below. (ref 4603, 4918)