Jobs    Everything

Select a Metro Area

The hard lessons learned from the DeepSeek models may ultimately help U.S. AI companies and speed progress toward human-level AI.

The Chinese company DeepSeek sent shockwaves through the AI and investment communities this week as people learned that it created state-of-the-art AI models using far less computing power and capital than anyone thought possible. The company then showed its work in published research papers and by making its models available to other developers. This raised two burning questions: Has the U.S. lost its edge in the AI race? And will we really need as many expensive AI chips as we’ve been told? 

“Being resource limited forces you to come up with new innovative efficient methods,” Nishihara says. “That’s why grad students come up with a lot of interesting stuff with far less resources—it’s just a different mindset.” 

How much computing power did DeepSeek really use? 

DeepSeek claimed it trained its most recent model for about $5.6 million, and without the most powerful AI chips (the U.S. barred Nvidia from selling its powerful H100 graphics processing units in China, so DeepSeek made do with 2,048 H800s). But the information it provided in research papers about its costs and methods is incomplete. “The $5 million refers to the final training run of the system,” points out Oregon State University AI/robotics professor Alan Fern in a statement to Fast Company. “In order to experiment with and identify a system configuration and mix of tricks that would result in a $5M training run, they very likely spent orders of magnitude more.” He adds that based on the available information it’s impossible to replicate DeepSeek’s $5.6 million training run.

How exactly did DeepSeek do so much with so little?

DeepSeek appears to have pulled off some legitimate engineering innovations to make its models less expensive to train and run. But the techniques it used, such as Mixture-of-experts architecture and chain-of-thought reasoning, are well-known in the AI world and generally used by all the major AI research labs. 

Read the complete Fast Company article BY Mark Sullivan: https://www.fastcompany.com/91264706/after-a-week-of-deepseek-freakout-doubts-and-mysteries-remain

Share This