Does China’s DeepSeek Represent a New—and Much Cheaper—Frontier in AI Technology? | BU Today

Does China’s DeepSeek Represent a New—and Much Cheaper—Frontier in AI Technology? | BU Today

While not exactly like the Space Race, China’s bold advancement may herald a reckoning in the United States, BU computer science professor says

As tech companies in the United States collectively pour billions—soon maybe trillions—of dollars into developing powerful artificial intelligence tools, a small Chinese technology start-up has shown the world that it might be possible to do it for less. A lot less. Raising all sorts of questions about the future of AI. 

The scrappy Chinese start-up DeepSeek splashed onto the scene and upended US financial markets when it recently revealed that DeepSeek-R1, an AI model that rivals the best technology from domestic companies such as Microsoft and Google, was built for about $6 million—a sliver of what Meta is spending on its latest AI program. 

Some engineers and scientists are questioning DeepSeek’s claim. On Wednesday, officials at OpenAI and its partner, Microsoft, announced that they were investigating whether DeepSeek programmers had obtained proprietary technology without authorization to spur the development of DeepSeek-R1.

Regardless, the advances made by the DeepSeek team are impressive, says Mark Crovella, a Boston University College of Arts & Sciences professor of computer science and chair of academic affairs at the Faculty of Computing & Data Sciences.

DeepSeek engineers laid out their process in a 22-page paper that describes an innovative use of existing methods as a substitute for raw computing horsepower. 

But why forego powerful computing capabilities? It’s likely that the company had little choice. In 2022, the Biden administration banned the export of cutting-edge computer chips to China, in an attempt to maintain the US preeminence in the AI race. When the United States throttled the horsepower available to Chinese computer engineers, it seems that they pursued a workaround instead—one that may shake up the entire field of AI. 

“It seems that [the DeepSeek engineers] were probably forced to rely on older-generation hardware that doesn’t perform nearly as well,” Crovella says. “And so there’s some potential that one effect of the export controls was actually to force them to figure out how to make this program work smarter, rather than more expensively.”

BU Today spoke with Crovella about the technology and what it means for the AI race.

Q&A

with Mark Crovella

BU Today: How is the DeepSeek technology different from what we’ve seen before?

Crovella: One of the things that’s remarkable about this is it’s not a dramatically new technology, but it is a very intelligent combination of techniques that we already knew about. 

The kinds of improvements that they made come in two categories. To understand the first, think of a large language model [the models that train AI platforms] as architecture. They’ve got a collection of pieces that interact internally in a certain way to move data. DeepSeek improved the architecture in a significant but not a fundamentally new way. They’ve figured out how to move data inside the model more quickly. If you’re thinking of the architecture of buildings, it’s akin to discovering a new, more powerful motor for the elevators, so now we can build the building taller and get around faster. So, it’s a definite improvement, but it’s not fundamentally changing the notion of the architecture of the building.

The other improvement that they made is that they’ve adopted a different strategy for training these models from a nearby technology called reinforcement learning. For computer engineers, this is a very well understood concept, but [the DeepSeek engineers] thought about using it in a slightly new way and it turned out to work extremely well.

BU Today: How big a deal is the DeepSeek technology? Can you put it into context?

Crovella: There has been a scaling law that engineers have noticed over the past roughly 6 to 10 years. The scaling law says that every time we increase the amount of data and the amount of computation in these models—every time we make a major increase—we see a major increase in performance. And that’s been empirically borne out for quite a few years now. And so, that’s where the motivation comes from for people and companies to spend hundreds of billions of dollars, because they think that that gives them the potential for a corresponding improvement in performance, and that would give them a business advantage. 

And so the DeepSeek announcement doesn’t completely negate all of the empirical evidence from the past, but what we’re seeing is that maybe if you want to get a 10 times increase in performance, maybe you don’t have to buy 10 times as much hardware. Maybe there are algorithmic and methodological improvements that could get us there instead.

BU Today: It seems that the sky’s the limit when it comes to investment in tech. Already, we walk around with many times more computing power in our phones than Apollo 11 had when it took us to the moon. Does the DeepSeek development make a case for pushing domestic tech companies to be a little thriftier?

Crovella: You can imagine what the environmental impact of training these models is like. In terms of power, in terms of water, there are just enormous impacts. And so you could make an argument that the industry should be incentivized or induced to work smarter. After all, the scale of investments in AI infrastructure that companies are talking about in the next couple of years is just staggering. Microsoft is on track to spend $80 billion in the next year for AI infrastructure. Meanwhile, the research budget for the National Science Foundation [NSF] is $10 billion a year. So they’re talking about eight NSFs dedicated just to machine learning, just within Microsoft—there are four or five other companies making similar investments. What else could we do with some of that money? These companies are asking themselves the same questions now. 

These technology companies are spending a large fraction of their free cash flow on hardware and software to build infrastructure for machine learning, and that means they can’t do other things [with that money]. After the DeepSeek announcement, I’ve heard from people who are inside these companies that there’s a distinct sense of panic about whether they’ve made really bad bets on infrastructure—whether they’ve decided to spend their money in really wasteful ways over the next few years. 

A month ago, it seemed like a good idea to budget $80 billion on AI hardware. And now it looks like if they were as smart as DeepSeek, they would have only needed to budget $8 billion, right?

BU Today: Tech venture capitalist Marc Andreessen said that DeepSeek is “AI’s Sputnik moment.” Do you agree?

I think the thing to recognize is that there’s no real barrier to the flow of ideas. [The United States] tried to put up a barrier to the flow of hardware and prevent China from using our latest [graphics processing unit] hardware, but the ideas flow completely freely across borders, and so you can’t really stop folks in other countries from advancing. So, I don’t see it exactly like the Space Race, but I do see it as making clear that there’s now a critical mass of machine learning expertise in China that’s capable of, at least at times, creating advances that haven’t occurred to anyone in the United States.

Source link

Visited 1 times, 1 visit(s) today

Leave a Reply

Your email address will not be published. Required fields are marked *