data visualization PE051

Dirty Data Done Dirt Cheap: Bias in Historical Data

Bias is often hidden in historical data and using these data to train AI systems will make a bad situation even worse. Learn the right approach to adopt AI into your data-gathering process.

The biggest challenges facing many HR professionals are finding talent with the skills their business needs and retaining employees with key skills. The 2023 State of the Workplace report by the Society for Human Resource Management (SHRM) found that 80% of HR professionals surveyed faced labor shortages, and retaining talent ranked as the third-highest priority for HR professionals. Some providers (e.g., Eightfold, Gloat) suggest using AI to help struggling companies identify the best talent within their current workforce or passive candidates in the market. With claims of unlocking the power of talent data and creating a talent marketplace, these providers paint a pretty picture of leveraging historical talent data.

These providers leverage historical HR data to identify the types of candidates who have been successful in the past or employees promoted internally to suggest others who resemble them. Sounds like magic, but there is something these companies might not be telling you. Bias is often hidden in historical data and using these data to train AI systems will make a bad situation even worse. It is a case of dirty data done dirt cheap.

Understanding bias in historical data

Some biases can start even before candidates apply for roles. Men are 20% more likely to be presented with online job advertisements than women because advertising to women is more expensive, so advertising algorithms designed to maximize views will favor presenting ads to men. Some candidates might not even see advertised roles and the situation does not get any easier for candidates when they manage to apply for roles either. According to a large-scale meta-analysis of real-world historical HR data in the United States over a 25-year period with 55,842 applicants for 26,326 positions, white candidates received 36% more callbacks than African Americans and 24% more callbacks than Latinos. To make things worse, things have not improved over time, with minority candidates experiencing similar bias today as they did in the 90s.

Bias is often hidden in historical data and using these data to train AI systems will make a bad situation even worse.

A 2023 analysis of bias in six Western countries from North America and Europe with 174,079 applicants showed that bias in historical HR data is a worldwide problem. The study found high levels of bias against all minorities, with an increase in discrimination against Muslim and North African applicants after the year 2000.

If all this dirty data leaves a sour taste in your mouth, you are not alone. A survey by the Guardian in the United Kingdom found people from minority groups are almost twice as likely to have experienced being overlooked for a role or promotion in a way that was felt unfair in the past 12 months.

No matter how you look at it, historical data and data scraped from the internet are going to be riddled with unconscious bias and any AI system trained on that data will learn the same biases. For companies interested in leveraging the power of artificial intelligence to help find top talent while avoiding propagating bias and unfairly discriminating against minority candidates, it is critical to take a data-centric approach.

Combating data bias through the right approach

A data-centric approach to artificial intelligence emphasizes the importance of curating high-quality data sources that have been carefully audited for bias. The key difference between a data-centric approach and using convenience samples of historical data is that data are carefully gathered for the explicit purpose of training an AI system and audited to ensure the data is minimum of bias. The data-centric approach to artificial intelligence was first suggested by Andrew Ng, the founder of the Google Brain research lab, and the idea has been gaining mainstream support from the AI community in general.

At SHL, we adopted a data-centric approach to artificial intelligence with robust bias audits built into our data-gathering process. This approach aims to enhance the accuracy and objectivity of the data we use to train AI systems and meets the strict quality standards set by professional industry standards bodies like the British Psychological Society and the Equal Employment Opportunity Commission.

The key difference between a data-centric approach and using convenience samples of historical data is that data are carefully gathered for the explicit purpose of training an AI system and audited to ensure the data do not propagate bias.


If you are interested in learning more about how to leverage the power of AI without falling into the pitfalls of historical bias, SHL has created a best-practice guideline for use of AI in talent assessment and a webinar about the ethical and effective use of AI.

For more information about our AI-powered assessments, visit our assessment page.

headshot cam beazley

Author

Cam Beazley

An innovative and results focused professional who has extensive technical expertise in the design, development, validation, and implementation of assessment systems that maximize business outcomes. Providing an extensive understanding of rigorous psychometric principles and next generation technology, scientific concepts are skilfully woven with proven technology into workable and valid solutions for clients.

Explore SHL’s Wide Range of Solutions

With our platform of pre-configured talent acquisition and talent management solutions, maximize the potential of your company’s greatest asset—your people.

See Our Solutions