DeepSeek Is Cracking the ‘Black Box’ of Corporate AI Wide Open

DeepSeek Is Cracking the ‘Black Box’ of Corporate AI Wide Open

  • China-based DeepSeek AI is pulling the rug out from under OpenAI.
  • The DeepSeek algorithm is ‘open weight,’ which is similar to but different from ‘open source.’
  • Hardware limits, like “no Nvidia GPUs,” have always encouraged experimentation and innovation.

For the last week, the internet has buzzed under wave after wave of news about DeepSeek—a Chinese version of artificial intelligence (AI) programs like OpenAI’s ChatGPT, which use machine learning algorithms and oceans of training data with sketchy intellectual property rights to grow into incredibly powerful algorithms. DeepSeek made news predominantly for its reportedly low cost and for having been built with more common processors than the most cutting-edge (and extremely costly) Nvidia GPU hardware. (Nvidia has had a chokehold on AI development for years, and bitcoin mining before that, because their powerful hardware can crunch thousands more pieces of data at a time than the hardware of their competitors—or their older models.)

But neither of those factors may be DeepSeek’s most exciting legacy within the AI field. For people outside of massive corporations, DeepSeek is making news because its venture capital owners have chosen to make their model what’s called “open weight,” which is a subset of open source. That’s surprising, to say the least, for a company originating in Hangzhou (a city with 13 million people and an economy that’s reportedly larger than those of entire countries like Argentina) and based in Beijing (an even huger economy).

Popular Mechanics spoke with Luke Steuber, a AI programmer who works primarily with open source tools. Steuber explained that open source and open weight are different, but often conflated. “Open weight means you get the trained model parameters, but it doesn’t mean you can do whatever you want with it. Open source means real licensing freedom—modifications, redistribution, full community control. Open-weight models are useful for testing and deployment, but they come with handcuffs.”

In a way, it’s like finding a useful Google doc marked “Read Only.” If the document is open weight, you can make a copy to fill out and then print, but you can’t make any changes to it or share it freely. If it’s open source, you can make a copy, delete what you don’t need, add your own extra things, then post your new version for others to download.

That comparison may not make ‘open weight’ sound too great, but it’s incredible compared to the states of accessibility of other programs in the field. “The industry is in this weird half-open state right now, where you can use the tools but not really shape them unless you’ve got the means to retrain from scratch,” Steuber said. “Everything I build is based on [French AI] Mistral, because it’s entirely open. I wish that were true for others. But of course the dirty secret is that revealing open source, for most of them, would entail revealing that the ‘source’ was copyrighted.”

Indeed, the internet has enjoyed that OpenAI, whose closed model was allegedly trained on a variety of copyrighted texts, is now accusing DeepSeek of plagiarizing them—something we can only know because DeepSeek chose to be open weight. In a climate where U.S. exports of costly hardware were restricted to avoid China becoming competitive in AI, this could very well be an intentionally thumbed nose in our direction. OpenAI thinks DeepSeek’s achievements can only be explained by secretly training on OpenAI.

Steuber joins entire sectors of research scientists in celebrating DeepSeek’s open weights. In Nature, Elizabeth Gibney talks with researchers from the Max Planck Institute for the Science of Light in Germany, the University of Edinburgh in Scotland, and the University of Cambridge—all of whom welcome a new paradigm to test and play with. They value the openness in both the algorithm and the stepwise way it shows its “thinking” in progress. Coders do something similar that shows how a variable is changing after each step of their code, as it makes it much easier to see where something is going right or wrong. This is the opposite of traditional “black box” AI startup development.

Steuber explains that DeepSeek’s hardware efficiency—which he believes is likely true and represents important progress—is far more than a political or even financial gesture. “Here’s the raw bottom line,” Steuber said. “OpenAI [and] Anthropic optimized their models to do anything, at the cost of ‘everything’ (max resource use). That’s a good way to build a demo for a press release. It’s not the way people use things, and it’s not the way they should be used. In fact, I think it’s hurt them quite a lot.”

“Where we go from here shouldn’t be about how much money gets thrown at Nvidia data centers,” Steuber concluded. “It should be about the clever ways people use what we have to improve the lived experience of those using it. For my part, I’d rather have had ‘to-do’ reminders two years before stupid generative pictures of Elon [Musk] in a banana suit. The fact that it was the other way around is proof that the ‘real’ use for these things is yet to come.”

Caroline Delbert is a writer, avid reader, and contributing editor at Pop Mech. She’s also an enthusiast of just about everything. Her favorite topics include nuclear energy, cosmology, math of everyday things, and the philosophy of it all. 

Source link

Visited 3 times, 1 visit(s) today

Leave a Reply

Your email address will not be published. Required fields are marked *