Asian American Supersite

Subscribe

Subscribe Now to receive Goldsea updates!

  • Subscribe for updates on Goldsea: Asian American Supersite
Subscribe Now

Liang Wenfeng Develops Low-Cost AI Models that Outperform US Giants
By Tom Kagy | 27 Jan, 2025

China-based DeepSeek unleashes stunning efficiency innovations likely to disrupt global AI industry cost structures.

Rejecting the notion that only the US can produce AI advances, Liang Wenfeng has led DeepSeek to develop two large language models that rival the performance of AI models from Amazon-backed Anthropic and Elon Musk's xAI, among others. 

Remarkably, DeepSeek has developed architectural innovations that require far fewer GPUs, and GPUs of lower cost and sophistication — the only kind legally available to Chinese firms under stringent US export restrictions calculated to retard China's AI development.  

For example, Liang's DeepSeek spent only $5.6 million to train its most recent AI model.  By contrast Anthropic spent between $100 million and $1 billion to build its models, according to estimates by Anthropic co-founder Dario Amodei.

This stunning cost efficiency flows from DeepSeek's novel MLA (multi-head latent attention) architecture which cuts memory usage to 5-13% of conventional AI architecture. Add to that the fact that DeepSeek's novel Mixture-of-Experts (MoE) 16B architecture cuts computations by 60% and the cost savings are enough to upend assumptions about the number and type of Nvidia GPUs needed to train and operate AI systems.   

Even operating within this dramatically lowered cost regime DeepSeek's R1 and V3 AI models achieved reasoning benchmarks deemed superior to those of Anthropic's Claude and xAI's Grok in the January 25 update of Chatbot Arena, UC Berkeley's student-run industry-standard AI performance rating chart.  

The fact that Google's Gemini came out on top in that rating takes nothing away from DeepSeek's achievement given the likelihood that Gemini cost multiple billions of dollars to train and operate.  DeepSeek's breakthrough is all the more impressive because, until a year ago, it was merely the research arm that provided AI-based trading tips for Liang's High-Flyer quant hedge fund. 

During that year Liang has become recognized by Beijing as China's quiet AI messiah, someone who can lead the nation's tech development toward global leadership.  And yet Liang, who presents himself as more of a super coding nerd than a business tycoon, insists that DeepSeek's mission is an altruistic one, dedicated to achieving the AI holy grail of artificial general intelligence (AGI) rather than making profits.  

"We believe the most important thing now is to participate in the global innovation wave," Liang told an interviewer in November, according to Chinatalk.  "For many years, Chinese companies are used to others doing technological innovation, while we focused on application monetization — but this isn’t inevitable. In this wave, our starting point is not to take advantage of the opportunity to make a quick profit, but rather to reach the technical frontier and drive the development of the entire ecosystem."

Liang's sincerity is backed up by the fact that DeepSeek has always kept its AI models open source.  That means the architectural shortcuts that underlie its disruptive efficiencies are available free to anyone interested, including American AI leaders.  One consequence of DeepSeek's breakthroughs is that AI developers may not need Nvidia's most advanced GPU, the H100, or at least fewer of them.  Presumably, as a Chinese company, DeepSeek has only had access to a dumbed-down version of Nvidia's last-generation A100 GPU.  

A sense of the relative value of those Nvidia chips to the AI industry is that the A100 now costs under $10,000 while the H100, if you can get your hands on them, are going for well over $24,000.  Those prices may change in the wake of DeepSeek's superior results using bargain-basement hardware.

Another fact that makes Liang both unique and invaluable to China's tech sector and economy is his conviction that the sheer energy and enthusiasm of China's young homegrown engineering talent can produce more dramatic open-ended AI development than "returnees" who have absorbed western AI experience.  Liang has walked the talk: the DeepSeek team is made up of young Chinese engineers without self-imposed limitations in their thinking.  Foremost among them is Liang himself who spends more time in the weeds of code than jetting around striking business deals.  He credits DeepSeek's success to his hands-on knowledge of AI and the kind of talent needed to build it, 

Liang was born in 1985 in the city of Zhanjiang on the extreme southern tip of the southeastern province of Guangdong.  His father was a grade-school teacher.  Liang earned a BS in electrical information engineering in 2007 from Zhejiang University, an elite public university focused on engineering.  He remained there for a masters in communication engineering in 2010.  

Liang was working on his masters in the wake of the global market crash of 2007-2008.  He became interested in using financial market data and machine learning to do quantitative trading.  After earning his masters Liang moved to Chengdu (Xanadu of the Coleridge poem), the ancient city known for its great beauty, and explored how AI might be applied to various fields. He concluded that the only way to use AI in a non-loss-making way was to apply it to investments.  This isn't surprising given AI's most stark limitation, especially in those days: its agency is confined to tasks performed in exclusively digital domains.

After co-founding a couple of AI-driven investment firms, in 2016 Liang and a fellow Zhejiang alumnus launched Ningbo High-Flyer Quantitative Investment Management.  An offshoot that followed in 2019 was High-Flyer AI which mainly served to provide trading tips for High-Flyer's investments.  Even as High-Flyer Investment assets achieved steady growth Liang's primary interest was pursuing AI development.  In 2021 he was earning enough from High-Flyer to afford thousands of Nvidia GPUs.  He decided he could outdo China's wealthy AI giants like ByteDance and Alibaba.  Unfortunately his nerdy, tongue-tied presentations convinced venture capitalists that he didn't have the makings of an AI tycoon.  

This left Liang no choice but to turn to his own High-Flyer for the investment needed to buy 10,000 Nvidia A100 GPUs before more stringent US restriction on AI chip exports to China took effect.  Thus DeepSeek was born.

With only about $8 billion in assets currently under management his High-Flyer hedge fund, while respected in China, is hardly capable of throwing off the kind of profits that companies like Google, Microsoft and Amazon can pump into development of their AI models.  Yet Liang has enough confidence in the power of his own talent and curiosity to pursue AGI, or human-level intelligence, a goal no one has yet come close to achieving.  In fact, most AI observers consider artificial general intelligence to be decades away, or perhaps even a century or more in the future.