AI Reconstructs Unique Speech from Brain Signals: How It Works & Where It Could Lead

Unique speech decoding and synthesizing system could yield neural prostheses that mimic a patient’s own voice.

Source: Getty Images

For patients who have lost their ability to speak—due to a stroke, amyotrophic lateral sclerosis, or other conditions that disconnect the brain’s speech signaling pathways—artificial intelligence and brain–computer interface systems have opened the door to neural speech prostheses. However, few such systems aim to mimic an individual’s unique voice, so that most yield generic speech that can sound robotic.

Aided by machine learning algorithms, a research group at NYU Langone Health has developed a unique speech decoder and synthesizer system that produces natural-sounding speech closely matching individuals’ voices. The team’s latest study—one of the largest in its field—tested the approach in 48 patients, showing how the technique can accurately re-create a broad range of speech.

Here’s more information about how the system works and how it could lead to far more individualized neural speech prostheses.

How does the new vocal reconstruction system work?

Speech can be mapped and synthesized from neural electrocorticography (ECoG) signals to a time–frequency representation of speech, or a spectrogram, says neuroscientist Adeen Flinker, PhD. He and colleagues first mapped out a set of 18 speech parameters that represent how the human voice changes in frequency, pitch, and other characteristics over time. A deep neural network can learn and re-create that complex mapping.

To capture an individual’s unique speech, the researchers trained a model on a prerecording of each patient’s voice. A technique called an autoencoder was used to constrain the 18 parameters representing that voice. Machine learning equations then helped combine the parameters into a more accurate re-creation of the person’s speech patterns.

A comparison of original audio recordings and accompanying spectrograms, collected as individuals speak various words, with the audio and spectrograms generated by the novel speech decoder. Source: NYU Langone Health

How do researchers gather the necessary data to decode speech?

Researchers have learned to decode speech from the brain signals of patients undergoing evaluation for epilepsy surgery. Electrodes implanted for presurgical evaluation also provide the necessary ECoG recordings for vocal reconstruction. Data from electroencephalography (EEG) electrodes that monitor activity from outside the skull are not sufficient, because the neural signals are distorted after traveling through bone and other tissue.

What makes this approach unique?

“Instead of generating a generic voice or robotic voice, we first learn the patient’s voice in a machine learning manner,” Dr. Flinker says. That approach yields a more realistic reproduction of the patient’s speech.

Additionally, unlike similar studies, which have included only a handful of patients, the team’s study represents the largest cohort to date in which a neural speech decoder for unique speech has been applied, proving that the approach is both reproducible and scalable. “We really wanted to push the envelope and see to what degree we can scale this up,” he says.

If the approach pans out, it could lead to clinical trials of brain–computer interface technologies that reproduce an individual’s own voice. In addition, the researchers have shared an open-source neuro-decoding pipeline to bolster collaboration and enable replication of the group’s results.

What other new information did the study reveal about speech signaling?

Language signals are typically associated with the brain’s left hemisphere, but the researchers used available data from the right hemisphere of 16 patients to produce “robust decoding” for them as well, Dr. Flinker says. “That’s a very exciting finding that hasn’t been shown before, especially not on this scale.”

If neural signals from the right hemisphere prove sufficient for decoding speech, he says, it could open up the prosthetic technology to patients with aphasia due to left-hemisphere damage. “The fact that we can decode from the right hemisphere is the first line of evidence that we can perhaps restore and synthesize speech based on intact right hemisphere signals and not the damaged left hemisphere,” Dr. Flinker says.

What’s next for this work?

The next immediate step, Dr. Flinker says, is to develop approaches that do not require a prerecording of a patient’s voice, while also expanding the system’s vocabulary to include more words and sentences. The team will then need to assess the efficacy of the technology on clinical trial participants with implanted electrodes who are unable to speak. Another significant challenge lies in designing a clinical trial for patients with aphasia to determine how effectively the technology functions when the left hemisphere is compromised.

Featured Experts

Adeen Flinker, PhD

Neurology

Adeen Flinker, PhD, is an associate professor of neurology. His research focuses on speech, electrocorticography, and signal processing.

AI Reconstructs Unique Speech from Brain Signals: How It Works & Where It Could Lead

How does the new vocal reconstruction system work?

How do researchers gather the necessary data to decode speech?

What makes this approach unique?

What other new information did the study reveal about speech signaling?

What’s next for this work?

Featured Experts

Adeen Flinker, PhD

Your Partner in Every Patient’s Care

Subscribe to Physician Focus

AI Reconstructs Unique Speech from Brain Signals: How It Works & Where It Could Lead

How does the new vocal reconstruction system work?

How do researchers gather the necessary data to decode speech?

What makes this approach unique?

What other new information did the study reveal about speech signaling?

What’s next for this work?

Featured Experts

Adeen Flinker, PhD

Your Partner in Every Patient’s Care

Latest in Neurology

Pediatric Neuroimmunology Service Expedites Early Intervention

Sharpening Focus on Genetic Drivers of Parkinson’s Disease & Movement Disorders

Operative Approach for a Giant 110-Cubic-Centimeter Central Neurocytoma

Subscribe to Physician Focus

The Best Experts and Latest Breakthroughs