DeepSeek’s ‘aha moment’ creates new way to build powerful AI with less money

DeepSeek’s ‘aha moment’ creates new way to build powerful AI with less money

Chinese AI lab DeepSeek adopted innovative techniques to develop an AI model that was trained with limited human intervention, producing an “aha moment” that could transform the cost for developers to build killer applications based on the technology.

The research paper published on the workings of DeepSeek’s R1 “reasoning” model reveals how the group, led by hedge fund billionaire Liang Wenfeng, has achieved powerful results by removing bottlenecks in AI development. 

The paper shows how DeepSeek adopted a series of more efficient techniques to develop R1, which like OpenAI’s rival o1 model, generates accurate answers by “thinking” step by step about its responses for longer than most large language models.

DeepSeek’s breakthroughs come from its use of “reinforcement learning” to lessen human involvement involved in producing responses to prompts.

The company has also built smaller models with fewer parameters — the number of variables used to train an AI system and shape its output — with powerful reasoning capabilities by tweaking large models trained by competitors such as Meta and Alibaba.  

Together, these developments have sent shockwaves throughout Silicon Valley, as R1 outperforms some tasks compared with recently released models from OpenAI, Anthropic and Meta, but at a fraction of the money to develop.

“I think it’s just the tip of the iceberg in terms of the type of innovation we can expect in these models,” said Neil Lawrence, DeepMind professor of machine learning at Cambridge university. “History shows that big firms struggle to innovate as they scale, and what we’ve seen from many of these big firms is a substitution of compute investment for the intellectual hard work.”

Thumbs-ups leads to ‘aha moment’

Large language models are built in two stages. The first is called “pre-training”, in which developers use massive data sets that help models to predict the next word in a sentence. The second stage is called “post-training”, through which developers teach the model to follow instructions, such as solving maths problems or coding.

One way to get chatbots to generate more useful responses is called “reinforcement learning from human feedback” (RLHF), a technique pioneered by OpenAI to improve ChatGPT.

RLHF works by human annotators labelling the AI model’s responses to prompts and picking the responses that are best. This step is often laborious, expensive and time consuming, often requiring a small army of human data labellers.

DeepSeek’s big innovation is to automate this final step, using a technique called reinforcement learning (RL), in which the AI model is rewarded for doing the right thing. 

DeepSeek first developed a powerful text-predicting model called V3. It then used RL to “reward” the model, such as giving it a thumbs-up for generating the right answer.

The Chinese company found that by doing this process enough times, the model managed to spontaneously solve problems without human supervision.

This technique was also used by Google DeepMind to build AlphaGo, the AI system that beat human players at the ancient board game Go and kick-started the current boom in deep learning computing techniques almost a decade ago. 

DeepSeek said it discovered the model had what the company called an “aha moment” when it re-evaluated its answers and adjusted its processing time to solve different questions. 

“The ‘aha moment’ serves as a powerful reminder of the potential of [RL] to unlock new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future,” DeepSeek’s creators wrote in their research paper.

Lewis Tunstall, a researcher at Hugging Face, an AI research company, said: “It seems that the secret sauce to make this work is to just have a very, very strong pre-trained model, and then to just have very, very good infrastructure to do this reinforcement learning process at a large scale.”

Small models built using large ones

While OpenAI and Google are investing billions of dollars to build large language models, DeepSeek has also built smaller models that can be run on phones or web browsers by “distilling” the reasoning capabilities of bigger models.

DeepSeek used its R1 model to generate a relatively small set of 800,000 data points and then tweaked the models made by competitors such as Alibaba’s Qwen and Meta’s Llama using that AI-generated data.

DeepSeek found these distilled models were especially strong on reasoning benchmarks, in some cases outperforming flagship models such as Anthropic’s Claude. “It can basically solve most of the math problems I did in undergraduate,” said Tunstall.

That development could be a boon for app developers, who have a cheap and efficient way to build products. Teaching AI models to reason during “inference” — when the model is generating answers — is much more efficient than the pre-training process, which requires a lot of computing power, according to Lennart Heim, a researcher at Rand, a think-tank.

This new paradigm could allow competitors to build competitive models with far less computing power and money, he added. However, without money for chips, “they just can’t deploy them at the same scale”, Heim said. 

DeepSeek has not said how much it spent to build R1, but claimed it trained its V3 model, which R1 is based on, for only $5.6mn.

This sum does not include other costs, such as the likely acquisition of thousands of graphics processing units to train the model, or salaries, experiments, training and deployment, said Heim. 

And while DeepSeek has been the first to use its particular techniques, other AI labs are expected to follow suit, with Hugging Face already working on replicating R1.

US AI companies have also worked on using the capabilities of their big, state of the art models in smaller, more nimble models. Google launched Gemma last year, which is a more lightweight model based on its Gemini one. 

“The recipe of intelligence is quite simple,” says Thomas Wolf, co-founder and chief science officer at Hugging Face, adding that DeepSeek’s techniques were well understood by others in the field. “And that’s why I expect a lot of teams can redo this.” 

Additional reporting by Cristina Criddle in San Francisco and Madhumita Murgia in London 

Source link

Visited 1 times, 1 visit(s) today

Leave a Reply

Your email address will not be published. Required fields are marked *