If you say, “Hey Meta, take a picture,” Meta's glasses will take a picture. Ai Pin, a small computer that clips onto your shirt, translates foreign languages into your native language. The artificial intelligence screen displays a virtual assistant that speaks to you through a microphone.
Last year, OpenAI updated its ChatGPT chatbot to allow spoken responses. Also recently, Google introduced Gemini, a voice assistant replacement for his Android smartphone.
Years after many people decided it was uncool to talk to a computer, tech companies are betting on the resurgence of voice assistants.
Will it work this time? Maybe, but it might take some time.
Research conducted over the past decade shows that many people have never used a voice assistant such as Amazon's Alexa, Apple's Siri, or Google's Assistant, and an overwhelming majority of those who do use one. They said they don't want to be seen talking to a voice assistant in public.
I also rarely use voice assistants. And a recent experiment using Meta's glasses, which are equipped with cameras and speakers that provide information about the environment, concluded that it is still dangerous to talk into a computer in front of parents and children at the zoo. Surprisingly awkward.
I wondered if this felt normal. Not so long ago, using a Bluetooth headset to talk on the phone might have looked off to some people, but now everyone does it. Will we ever see masses of people walking around and talking to computers, like in a science fiction movie?
I posed this question to design experts and researchers, and the consensus was clear. New AI systems could increase the ability of voice assistants to understand what we're saying and actually help us, so we'll be talking to devices near us more. It will happen in the future, but we are still years away from doing this in public.
Here's what you need to know:
Why voice assistants are getting smarter
The new voice assistant is powered by generative artificial intelligence, which uses statistics and complex algorithms to guess which words belong in which words, similar to the autocomplete feature on your phone. This allows it to use context to understand your requests and follow-up questions better than virtual assistants like Siri and Alexa, which can only respond to a limited list of questions.
For example, if you ask ChatGPT, “How many flights are there from San Francisco to New York next week?” — and follow up with “What’s the weather like there?” “What should I pack?” — Chatbots can answer these questions because they are making connections between words to understand the context of the conversation. (The New York Times sued OpenAI and its partner Microsoft last year for using copyrighted news articles without permission to train chatbots.)
Older voice assistants like Siri respond to a database of commands and questions that they're programmed to understand, but they fail unless you use specific words, like “What's the weather in New York?” “What should I bring on my trip to New York?”
The former conversation sounds more fluid, like people talking to each other.
The main reason people gave up on voice assistants like Siri and Alexa was because computers couldn't understand many of the questions asked and had a hard time learning what questions worked.
Generative AI addresses many of the problems researchers have struggled with for years, said Dimitra Vergil, director of voice technology at SRI, the research institute that developed early versions of Siri before being acquired by Apple. he said. She said the technology would allow voice assistants to understand spontaneous speech and provide helpful answers.
John Berkey, a former Apple engineer who worked on Siri in 2014 and has been an outspoken critic of the assistant, said generative AI has made it easier for people to get help from their computers, so many of us don't use them. He said that he thinks he will start having conversations with people. And if enough of us start assistants, it could become the norm.
“Siri was limited in size and could only recognize a limited number of words,” he said. “We have better tools now.”
But the new wave of AI assistants poses new challenges, and widespread adoption could take years. Chatbots like ChatGPT, Google's Gemini, and Meta AI are prone to “hallucinations,” where they can't find the right answer and make it up. They neglect basic tasks such as counting and summarizing information from the web.
When voice assistants are useful and when they aren't
Even as voice technology improves, it is unlikely that conversations will be replaced by traditional computer operations using a keyboard, experts say.
People now have compelling reasons to talk to their computers when they are alone, such as setting a destination on a map while driving a car. But not only does it look weird to talk to your assistant in public, it's often impractical. I was wearing meta glasses at the grocery store when I asked to identify a piece of produce and the eavesdropping shopper cheekily replied, “That's a turnip.”
It is also undesirable to have someone else dictate confidential work emails on the train. Similarly, it would be imprudent to have a voice assistant read out your text messages at a bar.
“Technology solves problems,” says Ted Selker, a product design veteran who worked at IBM and Xerox PARC. “When are we solving problems and when are we creating problems?”
But Carolina Milanesi, an analyst at research firm Creative Strategies, says it's easy to find that talking to a computer is so useful that you don't care how strange it may seem to others. .
On your way to your next office meeting, you can ask your voice assistant to tell you about the people you'll be meeting. When you're hiking a trail, asking your voice assistant where to turn is faster than stopping and opening a map. While you're visiting a museum, it would be great if your voice assistant could give you a history lesson about the paintings you're looking at. Some of these applications are already being developed using new AI technologies.
We got a glimpse of that future while testing some of the latest voice-driven products. For example, when I was recording a video of myself making bread and wearing Meta's glasses, my hands were full, so it was helpful to be able to say, “Hey, Meta, record a video.” Also, having Humane's Ai Pin dictate her to-do list was more convenient than having to stop and look at her phone screen.
“When you're walking around, that's the sweet spot,” said Chris Schmandt, who has worked on voice interfaces for decades at the Massachusetts Institute of Technology's Media Lab.
When he was an early adopter of one of the first cell phones about 35 years ago, he recalled, people would stare at him as he walked around the MIT campus talking on the phone. This is normal now.
I'm sure the day will come when people will occasionally talk to their computers while out and about, but it will come very slowly.