University of Florida researchers developed an audio deepfake detection method that measures differences between organic and synthetic speech samples.
According to scientists, today there are ways to restore the voice according to the anatomical structure of the respiratory system. Thanks to such methods, you can hear how even dinosaurs sounded.
To recognize deepfakes, the researchers performed the reverse procedure. They modeled the vocal tract for organic and synthetic voice using audio recordings. This allowed them to recreate the anatomy of the speaker’s respiratory organs from a segment of the audio passage.
As a result, scientists have found that deepfake recordings are not limited to the anatomy of the human vocal tract. In the process of modeling, they saw forms that people do not have.
According to the researchers, the accuracy of the method on the test data set has reached 99%.
Scientists said that the study not only confirmed the hypotheses put forward, but also revealed other features. For example, in an audio deepfake, the model often produced vocal tracts of the same relative diameter and consistency as a drinking straw. In fact, human organs are much wider and more complex in structure, the authors noted.
According to the researchers, their approach will make it possible to recognize even fakes that are convincing to the human ear.
“The subtle but biologically limited aspects of human speech generation are not captured by current models […]. Consequently, [анатомия] can act as a powerful tool for detecting audio deepfakes,” the authors say.
Recall that in October 2021 it became known that in the United Arab Emirates, fraudsters forged the voice of the head of a large company and stole $ 35 million.
That same month, researchers announced the ability of audio deepfakes to deceive both speech recognition devices and listeners.
Subscribe to Cryplogger news in Telegram: Cryplogger AI – all the news from the world of AI!