DeepSeek is an AI chatbot that works much like ChatGPT. It helps users with tasks such as coding, reasoning, and solving math problems. Powered by the R1 model, it has a whopping 670 billion parameters, making it the largest open-source language model as of January 28, 2025.

DeepSeek has two models: v3 and R1. The R1 model shines in reasoning. It generates responses in steps, mimicking how humans think. This method reduces memory usage, making it cheaper to run compared to many competitors. In fact, developing DeepSeek only cost about $6 million, which is a small fraction of the more than $100 million spent on OpenAI’s GPT-4.

The exact methods used to create DeepSeek aren’t fully clear. However, reports suggest that its founder stockpiled Nvidia A100 chips. These chips have been restricted for export to China since September 2022. This stockpile might exceed 50,000 units and, combined with less advanced but more affordable H800 chips, led to a powerful yet budget-friendly AI model.

DeepSeek can work with a subset of its model's parameters at a time. Its training costs are much lower than those of industry giants. This gives DeepSeek an edge over competitors like ChatGPT, Google Gemini, Grok AI, and Claude AI.

DeepSeek R1's code is open-source, but the training data remains proprietary. This transparency allows others to verify the company’s claims. Plus, the model’s efficiency means faster and cheaper AI research, opening doors for more exploration. This accessibility might also lead to deeper investigations into large language models (LLMs).

DeepSeek-V2 introduces several key innovations, including a unique Mixture-of-Experts (MoE) architecture and a Multi-head Latent Attention (MLA) mechanism.

  • Mixture-of-experts (MoE) architecture: This architecture activates only a part of the model’s parameters, reducing the computational resources needed to process queries. Instead of one massive neural network, it uses smaller “expert” networks, each focusing on different aspects. Only a few of these experts are activated for each query, making the computation more efficient.
  • Multi-head latent attention (MLA): MLA is a new attention mechanism that significantly reduces the model's memory usage. Traditional attention methods require storing large amounts of information, which can be costly. MLA compresses this information into a smaller “latent” representation, allowing for more efficient processing.

These AI models also improve their performance through a trial-and-error learning approach, similar to how humans learn.

DeepSeek balances advanced AI features with cost-effective development. This strategy may shape the future of large language models. Marc Andreessen described the release of DeepSeek R1 as a “Sputnik moment” for U.S. AI, indicating a serious challenge to American dominance in the field.

A “Sputnik moment” refers to an event that suddenly reveals a technological gap between countries, prompting renewed focus on research and innovation.

Did you know? AI expert Tom Goldstein, a professor at the University of Maryland, estimates that ChatGPT costs around $100,000 daily and a staggering $3 million monthly to operate. These figures are based on expenses associated with Azure Cloud, which provides the necessary server infrastructure.

DeepSeek was founded in December 2023 by Liang Wenfeng, who launched the first large language model the following year. Liang, a graduate of Zhejiang University with degrees in electronic information engineering and computer science, has emerged as a prominent figure in the global AI industry.

Unlike many Silicon Valley entrepreneurs, Liang has a strong finance background. He is the CEO of High-Flyer, a hedge fund focused on quantitative trading that uses AI to analyze financial data and make investment decisions. In 2019, High-Flyer became China’s first quant hedge fund to raise over 100 billion yuan (about $13 million).

Liang established DeepSeek as a separate entity from High-Flyer, though the hedge fund remains a significant investor. DeepSeek focuses on developing and deploying advanced AI models, especially LLMs.

Liang, often called the “Sam Altman of China,” emphasizes the need for China to innovate rather than imitate in AI. In 2019, he stressed the importance of advancing China's quantitative trading sector to compete with the U.S. He believes the true challenge for Chinese AI is moving from imitation to innovation, requiring original thinking.

The significance of DeepSeek lies in its potential to transform the tech and financial landscape of AI. While tech leaders in the U.S. were investing in nuclear energy to power their data centers, DeepSeek achieved similar goals with much less complexity.

AI development is resource-intensive. Meta’s investment of $65 billion in technology development is a prime example. OpenAI CEO Sam Altman noted that the AI industry needs trillions of dollars to develop advanced chips for energy-hungry data centers, a critical part of these models.

DeepSeek shows that comparable AI capabilities can be achieved at significantly lower costs and with less sophisticated hardware. This breakthrough challenges the common belief that developing AI models requires massive investments.

The availability of AI models at a lower cost and with simpler chips could dramatically increase their use across industries, boost productivity, and foster unprecedented innovation.

Did you know? Microsoft has heavily invested in OpenAI, initially putting in $1 billion and later adding another $10 billion. This strategic move seems to be paying off, as Bing has seen a 15% increase in daily traffic since integrating ChatGPT.

ChatGPT and DeepSeek are both advanced AI tools, but they serve different purposes. DeepSeek is designed for problem-solving in tech, making it ideal for users who need an efficient tool for specific tasks. ChatGPT, on the other hand, is a versatile AI known for its user-friendliness and creativity, suitable for everything from casual conversations to content creation.

In terms of architecture, DeepSeek R1 uses a resource-efficient MoE framework, while ChatGPT employs a versatile transformer-based approach. Transformers revolutionized natural language processing by using attention mechanisms to weigh the importance of different parts of the input sequence when processing information.

The MoE architecture uses 671 billion parameters but activates only 37 billion per query, enhancing efficiency. ChatGPT, with its monolithic 1.8 trillion-parameter design, is suitable for diverse language generation and creative tasks.

Reinforcement learning (RL) in DeepSeek achieves human-like “chain-of-thought” problem-solving without heavy reliance on supervised datasets. ChatGPT (o1 model) is optimized for multi-step reasoning, especially in STEM fields like math and coding.

DeepSeek is built to handle complex queries efficiently, offering precise solutions quickly and affordably. While ChatGPT is powerful, its main strength lies in general content generation rather than technical problem-solving. ChatGPT excels in creative tasks, helping users generate ideas, write stories, craft poems, and produce marketing content.

Cost is another key difference. DeepSeek offers a more affordable pricing model, especially for users needing AI assistance for technical tasks. ChatGPT, with its broader range of applications, comes at a higher cost for those seeking premium features or enterprise solutions. While ChatGPT offers free trials, DeepSeek is completely free to use, except for API access, which is more affordable than ChatGPT.

DeepSeek R1 was trained in 55 days on 2,048 Nvidia H800 GPUs for $5.5 million, which is less than one-tenth of ChatGPT’s training cost, estimated at around $100 million.

DeepSeek, like other Chinese AI models such as Baidu’s Ernie and ByteDance’s Doubao, is programmed to avoid politically sensitive topics. When asked about events like the 1989 Tiananmen Square incident, DeepSeek refuses to respond, stating that it is designed to provide only “helpful and harmless” answers. This built-in censorship may limit its appeal outside of China.

Security concerns have also been raised regarding DeepSeek. Australia’s science minister, Ed Husic, expressed caution about the app, emphasizing the need to scrutinize data privacy, content quality, and consumer preferences. He advised careful evaluation of these issues before widespread adoption.

In contrast, OpenAI is transparent about data collection and usage, with a stronger emphasis on user privacy, data security, and anonymization before using data for AI training.

While DeepSeek offers advanced AI capabilities at a lower cost, this affordability brings both opportunities and risks. The availability of advanced AI may make it accessible to malicious actors on both state and non-state levels, potentially compromising global security. Balancing innovation with potential geopolitical and security concerns is crucial.