Automated speech recognition and racial bias

Automated speech recognition (ASR) systems show worse performance for black speakers than for white speakers, a study finds. ASR systems are becoming increasingly prevalent in applications such as virtual assistants and hands-free computing. Such machine-learning algorithms convert speech into text based on a language model trained on text data and an acoustic model trained on audio data. Given recent reports of racial bias in other types of machine-learning algorithms, Sharad Goel and colleagues examined whether racial disparities also exist in ASR systems. The authors evaluated the performance of state-of-the-art ASR systems developed by Amazon, Apple, Google, IBM, and Microsoft in transcribing 19.8 hours of audio from 42 white speakers and 73 black speakers from different regions of the United States. On average, the ASR systems yielded an error rate of 0.35 per word for black speakers, compared to 0.19 for white speakers. Additional results suggest that racial disparities of ASR systems are attributable to deficiencies in the acoustic models’ ability to accurately capture the pronunciation and prosody of African American Vernacular English, a natural and long-studied system. According to the authors, the findings highlight the need for ASR system developers to use an audio training dataset that is broadly inclusive to ensure that the technology benefits society in an equitable fashion.

Article #19-15768: “Racial disparities in automated speech recognition,” by Allison Koenecke et al.

MEDIA CONTACT: Sharad Goel, Stanford University, Stanford, CA; tel: 607-339-9903; e-mail:

[email protected]

###

This part of information is sourced from https://www.eurekalert.org/pub_releases/2020-03/potn-asr031820.php

withyou android app