When OpenAI rolled out the latest version of its wildly popular chatbot ChatGPT this month, it gave the bot a new voice with human-like intonations and emotions, and an online demo showed the bot teaching a child how to solve a geometry problem.
Unfortunately, the demo was essentially a bait-and-switch: The new ChatGPT was released without most of its new features, including improved voice capabilities (which the company said were delayed for revisions), and the ability to use a phone's video camera to analyze things like math problems in real time is still unavailable.
As delays continued, the company disabled ChatGPT's voice, which some said resembled actress Scarlett Johansson, after being threatened with legal action and replaced it with another woman's voice.
For now, the only thing that's really rolling out in the new ChatGPT is the ability to upload photos for the bot to analyze. Users can generally expect faster and clearer responses. The bot can also do real-time language translation, whereas ChatGPT responds with an old-fashioned robotic voice.
Still, this is a cutting-edge chatbot that has turned the tech industry upside down, so it was worth reviewing. After trying the accelerated chatbot for two weeks, I have mixed feelings. It's good at language translation, but struggles with math and physics. Overall, I didn't see any meaningful improvement over the previous version of ChatGPT-4. I definitely wouldn't use it to tutor my kids.
This strategy of AI companies promising radical new features and delivering half-baked products is becoming a trend that confuses and frustrates people. Humane, a startup backed by OpenAI CEO Sam Altman, developed a $700 talking pin called AI Pin, which was widely panned for overheating and uttering gibberish. Meta also recently added an AI chatbot to its app, but it performed poorly at most of the tasks it was advertised for, such as searching the web for airline tickets.
Companies release AI products in their immature state, in part because they want people to use the technology and learn how to improve it. In the past, when companies announced a new technology product, such as a mobile phone, what we saw was a new camera, a brighter screen, or some other feature. With artificial intelligence, companies preview a potential future, demonstrating that the technology they're developing works only in limited, controlled conditions. A mature, reliable product may or may not emerge.
The lesson to be learned from all this is that we, as consumers, need to resist the hype and take a slow, cautious approach to AI. Until we see evidence that the tools work as advertised, we shouldn't spend big bucks on unfinished technology.
The new version of ChatGPT is called GPT-4o (the “o” in “omni”) and is available to try for free on OpenAI's website and app. Free users can make a few requests before timing out, while those with a $20 monthly subscription can ask the bot even more questions.
OpenAI said its iterative approach to updating ChatGPT will allow it to gather feedback for improvements.
“We believe it is important to preview our advanced models to provide a glimpse of their capabilities and help understand real-world applications,” the company said in a statement.
(Last year, The New York Times sued OpenAI and its partner Microsoft for using copyrighted news articles to train a chatbot without permission.)
Here's what you need to know about the latest version of ChatGPT:
Geometry and Physics
To show off ChatGPT-4o's new tricks, OpenAI released a video featuring Sal Khan, CEO of the education nonprofit Khan Academy, and his son Imran. By pointing a video camera at a geometry problem, ChatGPT was able to teach Imran how to solve it step-by-step.
ChatGPT's video analysis feature hasn't been released yet, but I was able to upload photos of geometry problems, and ChatGPT got some of the easier ones right but stumbled on the harder ones.
In one question I found on an SAT prep website about intersecting triangles, the bot understood the question but gave the wrong answer.
Taylor Nguyen, a high school physics teacher in Orange County, California, uploaded a physics problem about a man on a swing, which is often asked on advanced-level calculus tests. ChatGPT made several logical errors that led to the wrong answer, but was able to correct it with Nguyen's feedback.
“I could provide guidance, but I'm a teacher,” he said. “How are students supposed to spot those mistakes? They're assuming the chatbot is right.”
We noticed signs of slow improvement as ChatGPT-4o successfully completed some division calculations where previous versions had gotten it wrong. But it also failed at a basic math task where past versions and other chatbots, such as Meta AI and Google's Gemini, had failed: the ability to count. When we asked ChatGPT-4o to name a four-syllable word that starts with “W,” it replied, “Awesome.”
OpenAI said it is continually working to improve its system's response to complex math problems.
Khan's company, which uses OpenAI's technology for its tutoring software Khanmigo, did not respond to a request for comment on whether he would leave the tutoring chat GPT to his son alone.
inference
OpenAI also highlighted the new ChatGPT's ability to do inference, or use logic to derive responses. So I ran ChatGPT through one of my favorite tests: I asked it to generate a “Where's Wally?” puzzle. When it showed me an image of a giant Wally standing in a crowd, I said the point was that it should be hard to find Wally.
The bot then generated an even bigger waldo.
Subba Rao Kumbhampati, a professor and artificial intelligence researcher at Arizona State University, has also tested some of the chatbots and said he hasn't seen any noticeable improvement in their reasoning capabilities over previous versions.
He presented ChatGPT with a puzzle using blocks.
If block C is on top of block A and block B is separately on the table, how can I create a stack of blocks where block A is on top of block B and block B is on top of block C without moving block C?
The answer is that it's impossible to place a block under those conditions, but like past versions, ChatGPT-4o consistently came up with a solution by moving block C. In this and other inference tests, ChatGPT sometimes received feedback and got the right answer, which Kambhampati said is counter to how artificial intelligence should work.
“You can fix it, but you're using your own intelligence when you do that,” he said.
OpenAI pointed to testing results showing that GPT-4o scored about 2 percentage points higher than the previous version of ChatGPT in answering general knowledge questions, indicating a slight improvement in reasoning capabilities.
language
OpenAI also said the new ChatGPT is capable of real-time language translation, which could help people converse with speakers of a foreign language.
I tested ChatGPT in Mandarin and Cantonese and found that it had no trouble translating phrases like “I'd like to book a hotel room for next Thursday” and “I'd like a king-size bed.” But accents were a bit off. (To be fair, my broken Chinese isn't that good either.) OpenAI said it's still working on improving accents.
ChatGPT-4o also works well as an editor: I typed out a paragraph I wrote and it quickly and effectively removed redundant words and jargon. ChatGPT's language translation performance was good, and I'm sure this will become an even more useful feature soon.
Conclusion
A key thing OpenAI did right with ChatGPT-4o is to allow people to try out this technology for free. Free is the right price: we're helping to train and improve these AI systems with our data, so we shouldn't have to pay a fee.
The best of AI is yet to come, and one day AI may become the great math tutor we love to talk about, but we'll have to see and hear it to believe it.