DeepSeek has been in the news, and for good reason. The controversial claim that the Chinese startup used just $6 million within two months to build an AI model shocked developers across the globe.
It made stakeholders question the billions of dollars expended on similar projects by U.S. tech firms.
However, new facts are beginning to emerge about the actual cost of training an AI model such as DeepSeek.
The Billion-Dollar Reality Behind DeepSeek’s AI Training
In an update shared with his 1.2 million followers on X, David Sacks, White House AI and Crypto Czar has pushed back on the claim.
Sacks described it as misleading that DeepSeek’s AI computing cost a mere $6 million.
Sacks referenced a report by Dylan Patel, a semiconductor analyst renowned for his work with SemiAnalysis.
Patel claims that DeepSeek would have spent over $1 billion on its compute cluster. This is hundreds of millions more than the widely reported figure of $6 million put forward.
The White House AI and Crypto Czar maintained that the $6 million figure accounts only for the final training run.
This amount does not factor in critical aspects of DeepSeek’s major expenditure.
Some notable expenses excluded from the reported amount include capital expenditure.
That is the cost of buying and setting up the hardware for DeepSeek. It also ignores Research and Development (R&D) costs.
All expenses related to researching the AI model, developing it, and optimizing DeepSeek were allegedly not calculated at $6 million.
Sacks implies that the cost of building DeepSeek’s AI model falls within the billion-dollar range.
He dismissed the $6 million reports as untrue and misleading, possibly to just score points.
SemiAnalysis Provides a More Expensive Breakdown
Interestingly, SemiAnalysis provided a breakdown of DeepSeek’s alleged misleading claims.
It insisted that the $6 million only accounts for the Graphics Processing Unit (GPU) pre-training.
It pegs the actual total infrastructure cost and R&D and server capital expenditure at around $1.3 billion.
Another notable claim it debunked relates to the chips. SemiAnalysis noted that while DeepSeek operates 50,000 Hopper GPUs, they are not all top-tier H100s.
Rather, it is a mix of H100s, H800s and H20s. The H20s are China’s version due to the U.S export restriction.
Per performance insights, DeepSeek’s R1 model is comparable to OpenAI’s o1 regarding task reasoning.
However, it does not pass as a clear leader on all standards. Meanwhile, Google’s Gemini Flash 2.0 has similar capability and is arguably cheaper for API access.
Regarding innovation efficiency, Multi-Head Latent Attention (MLA) greatly reduces cost by cutting KV cache usage by 93.3%.
Overall, DeepSeek’s cost could drop by about 5x by year-end, favoring the startup. Notably, this drop in cost may allow DeepSeek to scale faster than other players in the space.
SemiAnalysis, however, highlighted U.S. export restrictions as a potential hurdle to DeepSeek’s expansion ambitions.
Impact on the Crypto Sector
Analysts have opined that regardless of the true cost of DeepSeek’s AI model training, it could prove a game changer in the crypto sector.
Blockchains and different crypto projects will demand more efficiency and value for money from developers as it suggests more could be done for less.
Some have suggested that the crypto sector might witness new upgrades that aim for scalability and efficiency.
Following the update, AI tokens went on a rebound streak. Internet Computer jumped 2.80% to $9.9336, Injective rallied 1.82% to $20.40, with Near Protocol jumping 2.2% to $4.616.
The post DeepSeek Factcheck: Here’s The True Cost of AI Model Training appeared first on The Coin Republic.