In the fast-paced race to build generative AI systems, the tech industry's mantra is that bigger is better, regardless of price.
Now, tech companies are starting to embrace smaller AI technologies that are less powerful but cost much less. And for many customers, that may be a good trade-off.
Microsoft on Tuesday unveiled three small AI models that are part of a family of technologies the company has named Phi-3. According to the company, even the smallest of the three is nearly identical to GPT-3.5, the much larger system that powered OpenAI's ChatGPT chatbot that surprised the world when it was released in late 2022. It is said that the performance was the same.
The smallest model of Phi-3 is small enough to fit on a smartphone, so it can be used without an internet connection. It also runs on the kind of chips found in regular computers, rather than the expensive processors made by Nvidia.
Smaller models require less processing, allowing large technology providers to charge customers less. They hope this will mean more customers can apply AI to places where large-scale, sophisticated models were previously too expensive to use. Microsoft said using the new model will be “significantly cheaper” than using larger models like GPT-4, but did not provide further details.
Smaller systems have lower performance and can be less accurate and sound choppy. But Microsoft and other tech companies believe that customers are willing to sacrifice some performance if it ultimately allows them to buy into AI.
Customers envision many ways to use AI, but for the largest systems, “it's like, 'Yeah, but it might be a little expensive,'” says Microsoft executive Eric Boyd. Almost by definition, smaller models are cheaper to implement, he said.
Boyd said some customers, such as doctors and tax offices, can justify the cost of a larger, more accurate AI system because their time is so valuable. However, many tasks may not require the same level of precision. For example, online advertisers believe that AI can help them better target their ads, but the costs need to come down for the system to be used regularly.
“I want the doctors to make the right decision,” Boyd said. “An online user summarizing his reviews in other situations says that even if there is a slight discrepancy, it's not the end of the world.”
Chatbots are powered by a mathematical system called a large-scale language model (LLM) that spends weeks analyzing digital books, Wikipedia articles, news articles, chat logs, and other texts culled from across the internet. By pinpointing patterns in all texts, they learn to generate texts on their own.
However, LLM stores so much information that it requires significant computing power to retrieve what is needed for each chat. And it's expensive.
While tech giants and startups like OpenAI and Anthropic are focused on improving the largest AI systems, they are also racing to develop smaller models that offer lower prices. For example, Meta and Google have released smaller models in the past year.
Meta and Google have also “open sourced” these models. This means that anyone can use and modify it for free. This is a common way for companies to get outside help improving their software and encourage the larger industry to use their technology. Microsoft is also open sourcing its new Phi-3 model.
(The New York Times sued OpenAI and Microsoft in December for copyright infringement of news content related to AI systems.)
After OpenAI released ChatGPT, the company's CEO Sam Altman said that the cost of each chat is:single digit centThis is a huge expense considering that popular web services like Wikipedia are offered for a fraction of a penny.
Now, researchers say their smaller model can at least approach the performance of leading chatbots such as ChatGPT and Google Gemini. Essentially, the system can still analyze large amounts of data, but the patterns it identifies are stored in smaller packages that can be delivered with less processing power.
Building these models is a trade-off between power and size. Sébastien Bubeck, a researcher and vice president at Microsoft, said the company built new, smaller models by refining the injected data and worked to ensure the models learned from high-quality text.
Some of this text was generated by the AI itself. This is known as “synthetic data”. Human curators then worked to separate the clearest text from the rest.
Microsoft has built three different small models: Phi-3-mini, Phi-3-small, and Phi-3-medium. The Phi-3-mini, which goes on sale Tuesday, is the smallest (and cheapest), but not the most powerful. Phi-3 Medium, not yet available, is the most powerful, but also the largest and most expensive.
Gil Luria, an analyst at investment bank DA Davidson, said that if the system were made small enough to connect directly to a phone or computer, it would be “much faster and orders of magnitude cheaper.”