SAN FRANCISCO (AP) — Tech giant OpenAI has touted its artificial intelligence-based transcription tool, Whisper, as having “near human-level robustness and accuracy.”

But Whisper has a major flaw: It tends to compose chunks of text, or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. These experts said some made-up texts — known in the industry as hallucinations — can include racist comments, violent rhetoric and even imaginary medical treatments.

Experts said such fabrications are problematic because Whisper is used in many industries around the world to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.

What is more concerning, they said, is a rush of medical centers use Whisper-based tools to transcribe patient consultations with doctors, despite OpenAI’ s warnings that the tool should not be used in “high risk areas”.

It’s difficult to determine the extent of the problem, but researchers and engineers say they’ve encountered Whisper’s hallucinations frequently in the course of their work. A University of Michigan One researcher conducting a study of public meetings, for example, reported finding hallucinations in eight out of ten audio transcripts he inspected, before starting to try to improve the model.

A machine learning engineer said he initially discovered hallucinations in about half of the more than 100 hours of Whisper transcripts he analyzed. A third developer reported finding hallucinations in almost every one of the 26,000 transcripts he created with Whisper.

Problems persist even in short, well-recorded audio samples. A recent study by computer scientists discovered 187 hallucinations in more than 13,000 clear audio clips they examined.

This trend would result in tens of thousands of faulty transcriptions across millions of recordings, the researchers said.

___

This story was produced in partnership with the Pulitzer Center’s AI Accountability Network, which also partially supported the Whisper academic study. AP also receives financial assistance from the Omidyar Network to support coverage of artificial intelligence and its impact on society.

___

Such errors could have “very serious consequences”, particularly in hospital settings, said Alondra Nelsonwho led the White House Office of Science and Technology Policy for the Biden administration until last year.

“No one wants a misdiagnosis,” said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. “There should be a higher bar.”