OpenAI launched its best new AI model in September. It already has challengers, one from China and another from Google.

20.12.2024 19:57

OpenAI CEO Sam Altman.

Andrew Caballero-Reynolds/AFP/Getty Images

OpenAI's o1 model was hailed as a breakthrough in September.
By November, a Chinese AI lab had released a similar model called DeepSeek.
On Thursday, Google came out with a challenger called Gemini 2.0 Flash Thinking.

In September, OpenAI unveiled a radically new type of AI model called o1. In a matter of months, rivals introduced similar offerings.

On Thursday, Google released Gemini 2.0 Flash Thinking, which uses reasoning techniques that look a lot like o1.

Even before that, in November, a Chinese company announced DeepSeek, an AI model that breaks challenging questions down into more manageable tasks like OpenAI's o1 does.

This is the latest example of a crowded AI frontier where pricey innovations are swiftly matched, making it harder to stand out.

"It's amazing how quickly AI model improvements get commoditized," said Rahul Sonwalkar, CEO of startup Julius AI. "Companies spend massive amounts building these new models, and within a few months they become a commodity."

The proliferation of multiple AI models with similar capabilities could make it difficult to justify charging high prices to use these tools. The price of accessing AI models has indeed plunged in the past year or so.

That, in turn, could raise questions about whether it's worth spending hundreds of millions of dollars, or even billions, to build the next top AI model.

September is a lifetime ago in the AI industry

When OpenAI previewed its o1 model, back in September, the product was hailed as a breakthrough. It uses a new approach called inference-time compute to answer more challenging questions.

It does this by slicing queries into more digestible tasks and turning each of these stages into a new prompt that the model tackles. Each step requires running a new request, which is known as the inference stage in AI.

This produces a chain of thought or chain of reasoning in which each part of the problem is answered, and the model doesn't move on to the next stage until it ultimately comes up with a full response.

The model can even backtrack and check its prior steps and correct errors, or try solutions and fail before trying something else. This is akin to how humans spend longer working through complex tasks.

DeepSeek rises

In a mere two months, o1 had a rival. On November 20, a Chinese AI company released DeepSeek.

"They were probably the first ones to reproduce o1," said Charlie Snell, an AI researcher at UC Berkeley who co-authored a Google DeepMind paper this year on inference-time compute.

He's tried DeepSeek's AI model and says it performs well on complex math problems that must be solved by thinking for longer and in stages.

He noted that in DeepSeek's DeepThink mode, the model shows users every step of its thought process. With o1, these intermediate steps are hidden from users.

"I've asked people at OpenAI what they think of it," Snell told BI. "They say it looks like the same thing, but they don't how DeepSeek did this so fast."

OpenAI didn't respond to a request for comment.

Andrej Karpathy, an OpenAI cofounder, praised Google's new "Thinking" model for the same reasoning feature.

"The prominent and pleasant surprise here is that unlike o1 the reasoning traces of the model are shown," he wrote on X. "As a user I personally really like this because the reasoning itself is interesting to see and read — the models actively think through different possibilities, ideas, debate themselves, etc., it's part of the value add."

A DeepSeek demo

Snell shared a multi-step math problem with Business Insider, which was used to test DeepSeek for ourselves:

"Find a sequence of +, -, /, * which can be applied to the numbers 7, 3, 11, 5 to get to 24, where each of the given numbers is used exactly once."

BI put that prompt in DeepSeek's chat window on its website. The model responded initially by laying out the challenge ahead.

"Alright, so I've got this problem here: I need to use the numbers 7, 3, 11, and 5, and combine them with the operations of addition, subtraction, multiplication, and division, using each number exactly once, to get to 24," it replied. "At first glance, this seems a bit tricky, but I think with some systematic thinking, I can figure it out."

It then proceeded through multiple steps over roughly 16 pages of discussion that included mathematical calculations and equations. The model sometimes got it wrong, but it spotted this and didn't give up. Instead, it swiftly moved on to another possible solution.

"Almost got close there with 33 / 7 * 5 ≈ 23.57, but not quite 24. Maybe I need to try a different approach," it wrote at one point.

After a few minutes, it found the correct solution, according to Snell.

"You can see it try different ideas and backtrack," Snell explained in an interview on Wednesday. He highlighted this part of DeepSeek's chain of thought as particularly noteworthy.

"This is getting really time-consuming. Maybe I need to consider a different strategy," the AI model wrote. "Instead of combining two numbers at a time, perhaps I should look for a way to group them differently or use operations in a nested manner."

Then Google appears

Snell said other companies are likely working on AI models that use the same inference-time compute approach as OpenAI.

"DeepSeek does this already, so I assume others are working on this," he added on Wednesday.

The following day, Google released Gemini 2.0 Flash Thinking. Like DeepSeek, this new model shows users each step of its thought process while tackling problems.

Google AI veteran Jeff Dean shared a demo on X that showed this new model solved a physics problem and explained its reasoning steps.

"This model is trained to use thoughts to strengthen its reasoning," Dean wrote. "We see promising results when we increase inference time computation!"

Read the original article on Business Insider