Tech in 2025 — China’s AI ‘Sputnik moment’ – Charm

This is an audio transcript of the Tech Tonic podcast episode: ‘Tech in 2025 — China’s AI ‘Sputnik moment’’

Murad Ahmed
Some are calling it ‘AI Sputnik moment.’ Last week, a Chinese company called DeepSeek released the details of an AI model that shocked Silicon Valley, and could change the whole narrative around AI development. Since the advent of ChatGPT, it’s been assumed that the US is way ahead of China when it comes to building world-changing artificial intelligence. But this new Chinese model suggests China is not as far behind as once thought.

I’m Murad Ahmed, and welcome to Tech Tonic. In this season of the podcast, we’re looking at some of the biggest tech stories likely to emerge in 2025. In this episode: is China about to catch up in the race? To help me answer this question, on the line from Beijing is the FT’s China technology correspondent Eleanor Olcott. Hi, Eleanor.

Eleanor Olcott
Hi, Murad.

Murad Ahmed
So, you have been writing about DeepSeek for a while now. In fact, you’re the very first person to have called me up and told me that we should be writing about this company. And indeed, you called me last week and insisted we had to write a story about how they had released their R1 model and really shocked the world on what they had managed to do. But before we get into that, let’s get a lay of the land. Since ChatGPT, China had been seen as lagging behind the US in AI. If China was behind, why was that the case?

Eleanor Olcott
Yeah, it’s incredible how quickly these narratives can change, right? So up until 2022, the widely accepted view was that China and the US were fairly evenly matched in AI. China was this land of abundant data and talented engineers. It had leading AI companies like Hikvision and SenseTime, which were really the first to commercialise AI in surveillance technology.

But then ChatGPT came out. This was a huge wake-up call for the Chinese industry. Tech companies here had largely neglected the potential for generative AI because it wasn’t entirely clear at that point that that was a commercial, viable end point for this technology. At the same time, you had Washington target China’s AI sector with export controls on the highest end, Nvidia GPUs, that’s the AI silicon needed to train and run these models. So for the Chinese players, the moment they realised that they’d fallen behind was also the time that they realised they’ve been banned from buying the hardware to catch up. So what we saw basically between 2022 and 2024 was people kind of wrote off the Chinese AI sector.

They assumed that the US was finally ahead. All of the money, the best talent, the best research was emanating from a few US companies. And so apart from a few journalists in China or in AI researchers, no one was really paying that much attention to what was going on here.

This past month has really changed everything. DeepSeek’s release has proved that Chinese AI deserves attention. And that actually over the past two years, they’ve been quietly plugging away, not just DeepSeek, right, but a whole bunch of other players who’ve been quietly plugging away in the background to close the gap with the US.

Murad Ahmed
Well, this week they’re certainly getting that attention. Explain who DeepSeek are and why they are causing such a stir.

Eleanor Olcott
So DeepSeek is basically started out as a bit of a side project for this eccentric hedge fund billionaire Liang Wenfeng. And he’s a quant trader by background. And the reason why that’s important is because quant traders basically have to utilise Nvidia GPUs to execute trades super, super quickly. That’s how they win, right. So Liang Wenfeng had this team of incredibly talented engineers making it possible to eke as much power as possible out of these Nvidia chips, and he transferred this team over to DeepSeek. And they started to utilise some of these tricks, some of these learnings to build LLMs in a very cheap and efficient way. DeepSeek is not the only one to be doing this. We also have been seeing similar things from the likes of MiniMax, from Alibaba and from Moonshot.

Murad Ahmed
And why are they causing such a stir right now?

Eleanor Olcott
So DeepSeek’s done two things over the past month which have caused somewhat of an existential crisis in Silicon Valley. The first thing it did is release its V3 model, which achieved comparative performance to US models from OpenAI and Meta, that it claims to have done this on a fraction of the budget. And then last week, it shocked the world once again with its R1 model. So this is a so-called reasoning model. This is an important new field of research in AI whereby models can tell if they got something wrong and self-correct without the need for human supervision. Now, this is really, really important because if we get to a place where models are capable of human critical thinking, then they’ll be able to complete tasks vastly more complicated and arguably more useful than we’ve seen currently on the markets. Just to say here, DeepSeek isn’t the first one to develop a so-called reasoning model. OpenAI and DeepMind have been busy on this too. But it is the first company to publish a detailed technical report, a recipe, as it were, on how to build a reasoning model. And the fact that we’re seeing such innovative research coming for the first time from a Chinese lab, especially that one that no one had heard of until this month, is nothing short of extraordinary.

And what makes DeepSeek such a formidable competitor in this landscape is unlike companies like OpenAI, their models are open-source, so they share details of them with the wider community. I spoke with Tiezhen Wang, who’s an engineer at Hugging Face, which is an open source AI community platform, and he kind of explained a little bit about this R1 model and what makes it so impressive.

Tiezhen Wang
I feel like it essentially gives out a new blueprint to close the gap between closed-source models and open-source models. The model itself finds a tactic to solve a complicated reasoning task by trying to think, oh am I doing it wrong? OK, let me think more deeply. So this kind of thing emerges automatically without, like, humans teaching them. So I feel like this is amazing, and the output is, like, surprisingly good, like in the benchmarks between the models, outperforms models like ChatGPT o1, etc. So it’s really amazing. And they give away all the models to the community, which means probably in the next few months you will see that all open-source models get upgrade.

Eleanor Olcott
I think this is really the reason why DeepSeek’s been so interesting as well, is that they released V3 and R1 on this shoestring budget, right. They used only a fraction of the GPUs that the US counterparts train similar-sized models on. So there’s this huge question as to whether or not we actually need to be buying hundreds of thousands of GPUs to build the biggest, baddest model.

Tiezhen Wang
So first, I think compute is very important. But actually what DeepSeek has been doing is telling the world that they still have like a lot of ways to use mass computing, yet are building great models. So we do not need that much of a compute in order to do all things. One of the reasons why DeepSeek is saving a ton of money is because they’re not using the typical transformer architecture. It’s using something like latent space magic to do some compression so that one GPU can hold more requests. This is very important.

Most of the large language models for now are based on the transformer architecture designed by Google in the year 2017. So it’s very powerful, but it requires a ton of compute and the cost goes exponentially. So I feel that also. . . I think about: is transformer a good architecture that will lead us to the next 10 years of advancement, or are we having a good alternative, a new paradigm?

Murad Ahmed
So this point about it being a reasoning model is quite interesting. In fact, if you use DeepSeek, you see it in real time. It shows it’s working, how it’s trying to answer a question, solve a problem. It’s fascinating to see it play out. In what ways is it similar or different to the things that OpenAI or Anthropic are doing?

Eleanor Olcott
Yeah. So the first thing to say on reasoning models is it’s still not kind of completely proven itself, right. Like it’s good at maths and coding tasks because it’s clear to the model when it’s right and when it’s wrong. Whereas, it’s not so adept as of yet in more creative tasks like, you know, writing poetry or writing film scripts. But the real difference between DeepSeek and the likes of OpenAI is that this is an open-source model. So currently, if you’re using OpenAI’s reasoning model, you have to access that through OpenAI’s cloud service provider. You have to pay money to use the model. And right now developers are actually complaining that the company is putting constraints on how much you can use the model. By contrast, if you’re using DeepSeek’s model and you can actually just download the whole thing on to your computer and use it for free. And the developers I’ve been speaking to are saying that R1 is showing comparable performance for certain mathematical and coding tasks. So the big question for OpenAI is: how can it continue to convince customers to pay for this stuff when Chinese players are offering similar kinds of services for free?

Murad Ahmed
So Ellie, reporters of the FT have been digging into DeepSeek’s research paper that explains how this reasoning model work, and what it seems to have done is piggybacked off models from OpenAI and Alibaba to build on top of that, right. And it’s done something called reinforcement learning to improve itself. So could you just take us through how that works?

Eleanor Olcott
Yeah. So basically, once you’ve trained an LLM, that’s not the final step, right. You’ve got to do a lot of post-training and fine-tuning. You’ve got to actually make that LLM useful to you. Now, how do you make that LLM useful to you? There are many different strategies to do that, and what this R1 model does is basically allows you to kind of build on top of existing LLMs and to make them useful for your specific task, but to do so in a way that does not require human supervision. So basically writing very clever code to teach the model to kind of train itself and to arrive at the conclusions itself without, you know, huge amounts of engineers plugging in fine-tuned data to arrive to that same conclusion.

Murad Ahmed
Right. And this human supervision is the key point. That’s the big breakthrough. We’ve been talking about them as AI, but anyone who’s working them realises that humans, engineers are constantly diving in, helping them adapt. And this model is showing that it is learning in a way that is more intuitive to itself rather than needing human beings to be involved.

Eleanor Olcott
Yes, precisely. Just to emphasise here, I mean, it’s still . . . there are still plenty of sceptics out there on reasoning models. I think they’ve got a lot to prove themselves as a real replacement for the kind of traditional way of doing post-training. But it certainly is a new kind of avenue that all the companies are piling into.

Murad Ahmed
I don’t think anybody at the moment is saying that DeepSeek’s AI model is clearly better than OpenAI, but it’s a lot cheaper. And that combination of things has led people like Marc Andreessen, the very famous Silicon Valley venture capitalist, to call DeepSeek’s R1 model ‘AI Sputnik moment’, a reference to the space race during the cold war, where the Soviet Union suddenly showed that it had a lead in space travel as a result of that. We are seeing these big sell-offs in US tech stocks and across the broader market, really. Talk us through why that’s happening.

Eleanor Olcott
So DeepSeek’s proved that it’s possible to do more with less when training AI models. Now, this is really, really significant for Nvidia, because Nvidia’s stock price has been supported by this narrative that’s developed in the US that building the biggest, baddest model requires building larger and larger computing clusters. So actually, right at the same time, that DeepSeek released this R1 model, we got the announcement from SoftBank and from OpenAI that they were building a $500bn computing cluster in the US to kind of get to AGI.

Murad Ahmed
And AGI is artificial general intelligence, the idea that computers can match human capability.

Eleanor Olcott
Exactly. And you know, DeepSeek is also hell-bent on getting to that point, but it’s not doing it with a $500bn computing cluster. So what it’s done has really challenged this conventional wisdom that more chips equals better model. That is still true, but it turns out it’s slightly more complicated than that. There are also lots of things that you can be doing on the engineering side to eke out a huge amount of computing power from a more limited number of chips. That said, I asked Tiezhen Wang from Hugging Face about this. He’s impressed by the progress of DeepSeek and others in China, but he says that the race is not over yet.

Tiezhen Wang
I think the open-source community is getting closer to the closed-source community. And also top, top-tier closed-source models, it definitely push them to release better stuff. We know that OpenAI has a lot of like technology. They’re developing their own and they’re not really using it. I feel that they still have like even better models when open-source models are getting better and better. They are forced to release more powerful models from their side. But because the DeepSeek model is still not as good as their top, frontier models, it’s probably not going to be a huge issue for them. They still have much better models.

Murad Ahmed
So maybe China’s still lagging behind on the most cutting-edge models compared to US companies, but you are starting to see that the Chinese companies like DeepSeek are showing that you can build really good models that can compete, and not needing these masses of GPUs from the likes of Nvidia to do it.

[MUSIC PLAYING]

Eleanor, we’ve been talking about DeepSeek and Chinese AI companies building impressive AI models without having to buy up masses of AI chips from companies like Nvidia. But Chinese companies are also developing their own AI chips, and they want to replace Nvidia. So as well as working out how to use fewer chips, could China be on the cusp of making its own AI chips?

Eleanor Olcott
So China does have its own AI chips, which are looking increasingly competitive for inference, but not for model training. So these are two different things. Nvidia has a complete stranglehold on chips for model training, ie the chips that train these LLMs. It’s a very complicated and intensive computational tasks to get there. But once you’ve trained the model, you can use other chips for running the models and actually kind of using them for AI applications like chatbots and that kind of thing. And Huawei, which is a name that we’re all familiar with, right. It’s the Chinese-sanctioned technology giant famous for building 5G infrastructure around the world, has emerged as the frontrunner here to build AI chips for inference. And they actually have a bit of an edge in China. They are seen by Nvidia as the most serious contender, and the most serious challenger in China, because they have a huge amount of government support domestically. And basically the government is telling customers, big tech companies, to buy from the national champion and also to help them improve their technology. And by all accounts, that technology is improving fast. But for now, the blunt reality is that the Chinese players are very, very reliant on Nvidia for model training.

Murad Ahmed
I think from what you’re saying and what our understanding is Nvidia’s GPUs, the most cutting-edge ones, are still the market leader, definitely on training. And you know, the CEO, Jensen Huang has said they will also be a leader in inference as well. And that does seem to be a pretty roaring black market in China for Nvidia’s chips that are trying to get around the export controls that have been imposed on the country. Is that right? How much is that helping China’s efforts to catch up in the AI race?

Eleanor Olcott
Yeah, I mean, I think the fact that you have this thriving black market for Nvidia GPUs is a testament to the fact that China is still very much reliant on this company and they’re very much enabling or letting these chips come in and be sold fairly openly — in fact, not fairly — incredibly openly and you know, tech markets and online even. So one of the biggest stories for us over the past few years is how companies have been fighting to secure access to Nvidia chips, both through legal and illegal means.

First, just talking about the legal means, because some of these big tech companies, right, they can’t be seen to be buying thousands of GPUs illegally in China. Some of them are maybe listed in the US. They’ve got US investors. It would be too risky for them to be caught doing this kind of thing. Luckily for them, there’s a huge playbook of ways to access Nvidia chips legally. The export controls that the Biden administration has rolled out have had plenty of loopholes in them. So first, you know, they started buying them for use overseas. The US closed that loophole. Then they switched to renting them instead of buying them overseas. The US again closed that loophole. Right now, what we’re seeing is the Chinese players trying to kind of develop clever technical tricks that will enable them to keep accessing the chips overseas while it’s also somehow remaining compliant with those export controls.

But the smaller players, they don’t have to kind of have these concerns. They can be a little bit more nimble and adaptive to the restrictions. And so that’s where this illicit trade in Nvidia’s silicon comes in. I mean, it’s incredibly obvious being a reporter here in China that this is a thriving market. I was at an AI conference a few months ago and someone came up to me and they assumed I was Russian datacentre procurer and they offered me some smuggled Nvidia GPUs. AI investors that I speak to here say prices have actually come down a lot in the past few months and that it’s fairly easy to buy a big cluster of Nvidia GPUs in China. The problem isn’t really getting the chips necessarily; it’s making them work. So it’s not a matter of just, OK, we’ve got a whole bunch of chips and pronto, we can use them for training. You also need the technical talent to make that work.

Murad Ahmed
And just a quick question (Laughs). What happened when you told them that you weren’t a Russian chip smuggler and you’re an FT journalist?

Eleanor Olcott
So he was a little bit disappointed as you can imagine when I told him that I’m a foreign reporter (Laughs), not a Russian datacentre procurer. However, I did then see the cogs whirring behind in his brain and he said, OK, well, you’re a journalist. You probably are not earning that much money. And he asked me how much my salary was, and if I’d wanted to join in on his side hustle, which I politely declined as my boss would be happy to know.

Murad Ahmed
We only hired the most ethical of journalists here at the FT. I’m glad you made that decision. So these difficulties that China and Chinese companies are having, securing the best AI chips seems like that’s gonna continue into this year with the Trump administration as well. It doesn’t seem like there’s gonna be any let up to this kind of stranglehold that the US has.

Eleanor Olcott
Yeah, I would be very surprised if the Trump administration suddenly thought, OK, let’s react to this DeepSeek breakthrough by making it easier for Chinese companies to access the most powerful Nvidia GPUs. And something really important to note here is that the next generation of Blackwell chips, that’s the most advanced Nvidia GPUs, are about to come online at some point, maybe this year, maybe next. And the Chinese players will not have immediate access to those. So there is a deep concern amongst the players that I speak to that actually we’re gonna start to see this gap widening again once the US players have access to those golden, golden chips.

Murad Ahmed
One final thought then, many of our listeners will have read a lot and heard a lot about AI in the US, but less about China. So in 2025, what should we expect from AI in China going forward?

Eleanor Olcott
So 2025 I think will look a lot like how it started, right. We’re gonna continue to see Chinese companies being incredibly innovative and adaptive to these restrictions, pushing out more and more competitive products. What DeepSeek has done here is really create an opening for Chinese players, and actually US players as well, like the global community to be using their learnings from the R1 paper to also pursue their own field of research.

So I think what we’re gonna see is a rapid proliferation of AI apps that are adopting some of these learnings and particularly looking at kind of the development of AI agents, for example, like whether or not some of these Chinese companies can find truly useful applications that can actually help companies, that can help consumers cut down their workload and be more efficient workers, consumers and that kind of thing.

Murad Ahmed
And just in this last week, we’ve seen huge disruption to US tech stocks and the biggest US AI companies are having to respond to what DeepSeek are doing and saying. In the year ahead, can we expect even more disruption from China to the US when it comes to AI?

Eleanor Olcott
DeepSeek has already caused a huge amount of disruption, right. They have proven that it’s possible in China to be building competitive AI models. I think the really disruptive thing that could happen and remains an open question is whether or not Huawei, or one of the other Chinese AI chipmakers could actually find a long-term sustainable role in being an AI chip provider in China. Now, this is really the holy grail because that means that China could be fully independent. It doesn’t have to rely on the US for this industry into the future. And that really is kind of the trillion-dollar question whether or not China can develop its own indigenous AI chip hardware ecosystem.

Murad Ahmed
Eleanor Olcott, thank you very much.

Eleanor Olcott
Thank you, Murad.

Murad Ahmed
That’s it for this episode of Tech Tonic. You can keep up with all of Eleanor’s reporting on DeepSeek and China at ft.com, along with all our other coverage of AI. I’ve also put some relevant free to read articles in the show notes. That’s it. See you next time.

Tech Tonic is presented by me, Murad Ahmed. Our senior producer is Edward Lane, and our producer is Persis Love. Executive producer is Manuela Saragosa. Sound design by Breen Turner, Sam Giovinco and Joe Salcedo. Music by Metaphor Music. Our global head of audio is Cheryl Brumley.

Source link

Visited 1 times, 1 visit(s) today

Leave a Reply Cancel reply

Related News

China News Live: BRICS nations propose reforms to end European leadership at IMF

The coder ‘village’ at heart of China’s latest AI frenzy