Skip to main content

How Artificial Intelligence gave a Paralyzed Woman her Voice back

 

Clinical courses

 

Clinical research courses

How Artificial Intelligence gave a Paralyzed Woman her Voice back

At the age of 30, Ann suffered a brainstem stroke that left her severely paralyzed. She lost control of all the muscles in her body and was unable even to breathe. It came on suddenly one afternoon, for reasons that are still mysterious.

For the next five years, Ann went to bed each night afraid she would die in her sleep. It took years of physical therapy before she could move her facial muscles enough to laugh or cry. Still, the muscles that would have allowed her to speak remained immobile.

“Overnight, everything was taken from me,” Ann wrote, using a device that enables her to type slowly on a computer screen with small movements of her head. “I had a 13-month-old daughter, an 8-year-old stepson and 26-month-old marriage.”

Today, Ann is helping researchers at UC San Francisco and UC Berkeley develop new brain-computer technology that could one day allow people like her to communicate more naturally through a digital avatar that resembles a person.  It is the first time that either speech or facial expressions have been synthesized from brain signals.

The system can also decode these signals into text at nearly 80 words per minute, a vast improvement over the 14 words per minute that her current communication device delivers.

Edward Chang, MD, chair of neurological surgery at UCSF, who has worked on the technology, known as a brain-computer interface, or BCI, for more than a decade, hopes this latest research breakthrough, published Aug. 23, 2023, in Nature, will lead to an FDA-approved system that enables speech from brain signals in the near future. 


“Our goal is to restore a full, embodied way of communicating, which is the most natural way for us to talk with others,” said Chang, who is a member of the UCSF Weill Institute for Neurosciences and the Jeanne Robertson Distinguished Professor. “These advancements bring us much closer to making this a real solution for patients.”

Decoding the signals of speech
Ann was a high school math teacher in Canada before her stroke in 2005. In 2020, she described her life since in a paper she wrote, painstakingly typing letter-by-letter, for a psychology class.
“Locked-in syndrome, or LIS, is just like it sounds,” she wrote. “You’re fully cognizant, you have full sensation, all five senses work, but you are locked inside a body where no muscles work. I learned to breathe on my own again, I now have full neck movement, my laugh returned, I can cry and read and over the years my smile has returned, and I am able to wink and say a few words.”


As she recovered, she realized she could use her own experiences to help others, and she now aspires to become a counselor in a physical rehabilitation facility.
“I want patients there to see me and know their lives are not over now,” she wrote. “I want to show them that disabilities don’t need to stop us or slow us down.”
She learned about Chang’s study in 2021 after reading about a paralyzed man named Pancho, who helped the team translate his brain signals into text as he attempted to speak. He had also experienced a brainstem stroke many years earlier, and it wasn’t clear if his brain could still signal the movements for speech. It’s not enough just to think about something; a person has to actually attempt to speak for the system to pick it up. Pancho became the first person living with paralysis to demonstrate that it was possible to decode speech-brain signals into full words.

“It was exciting to see her go from, ‘We’re going to just try doing this,’ and then seeing it happen quicker than probably anyone thought,” said Ann’s husband, Bill, who travelled with her from Canada to be with her during the study. “It seems like they’re pushing each other to see how far they can go with this.”
Rather than train the AI to recognize whole words, the researchers created a system that decodes words from smaller components called phonemes. These are the sub-units of speech that form spoken words in the same way that letters form written words. “Hello,” for example, contains four phonemes: “HH,” “AH,” “L” and “OW.”
Using this approach, the computer only needed to learn 39 phonemes to decipher any word in English. This both enhanced the system’s accuracy and made it three times faster.
“The accuracy, speed and vocabulary are crucial,” said Sean Metzger, who developed the text decoder with Alex Silva, both graduate students in the joint Bioengineering Program at UC Berkeley and UCSF. “It’s what gives Ann the potential, in time, to communicate almost as fast as we do, and to have much more naturalistic and normal conversations.”

Adding a face and a voice
To synthesize Ann’s speech, the team devised an algorithm for synthesizing speech, which they personalized to sound like her voice before the injury by using a recording of Ann speaking at her wedding.
“My brain feels funny when it hears my synthesized voice,” she wrote in answer to a question. “It’s like hearing an old friend.”
She looks forward to the day when her daughter – who only knows the impersonal, British-accented voice of her current communication device – can hear it too.
“My daughter was 1 when I had my injury, it’s like she doesn’t know Ann … She has no idea what Ann sounds like.”

The team animated Ann’s avatar with the help of software that simulates and animates muscle movements of the face, developed by Speech Graphics, a company that makes AI-driven facial animation. The researchers created customized machine-learning processes that allowed the company’s software to mesh with signals being sent from Ann’s brain as she was trying to speak and convert them into the movements on her avatar’s face, making the jaw open and close, the lips protrude and purse and the tongue go up and down, as well as the facial movements for happiness, sadness and surprise.
“We’re making up for the connections between her brain and vocal tract that have been severed by the stroke,” said Kaylo Littlejohn, a graduate student working with Chang and Gopala Anumanchipalli, PhD, a professor of electrical engineering and computer sciences at UC Berkeley. “When Ann first used this system to speak and move the avatar’s face in tandem, I knew that this was going to be something that would have a real impact.”

An important next step for the team is to create a wireless version that would not require Ann to be physically connected to the BCI.
“Giving people like Ann the ability to freely control their own computers and phones with this technology would have profound effects on their independence and social interactions,” said co-first author David Moses, PhD, an adjunct professor in neurological surgery.