Five days ago, the Chinese company DeepSeek launched its new GenAI model, DeepSeek-R1. This model has quickly taken the GenAI world by storm, as it has been shown to compete with OpenAI’s latest models, including the more “thinking models” OpenAI-o1, at a fraction of the price.
While it “only” has 671 billion parameters, rumors suggest that OpenAI-o1 has more than double that, marking a significant improvement compared to GPT. Since it takes up considerably less space, this also means that hardware requirements are lower. The model is open-source, meaning you can download and run it locally on your own hardware using tools like Ollama or as a hosted service from cloud providers.
Ollama has already made several versions of the model available, making it easy to implement with your own hardware, deepseek-r1.
It’s also worth noting that DeepSeek is not the first open-source LLM from China. Several models have been launched over the past year, including Qwen from Alibaba Cloud. Just yesterday, Qwen launched a new version, Qwen 2.5-Turbo, which supports up to 1 million tokens (~750,000 words) and includes a new image model.
DeepSeek costs about 1/10th of the price of GPT-4o from OpenAI and scores similarly on many of the various benchmarks, but…
- Image processing occurs via OCR rather than a neural network (so it’s not multimodal).
- It is primarily trained in English and Chinese, meaning it performs very poorly on Nordic languages. Many published benchmarks have been run only in Chinese.
- It only supports 64,000 tokens (~48,000 words), while GPT supports 128,000 tokens.
- It is not good at function calling, making it harder to integrate with third-party services or use as a basis for, e.g., virtual assistants.
Like many other models, DeepSeek can run on its own infrastructure. However, to achieve functionality on par with GPT-4o, it requires 1.3 TB of GPU memory, equivalent to 15x NVIDIA H100 – an investment of around 20 million NOK. But there are also smaller versions of the model that can run on your own machine. However you can use a lower quantization and reduce the amount quite significantly.
DeepSeek-R1 is one of the first large-scale models to use Multi-Token Prediction. This means the model tries to predict the next two words simultaneously in a given context, which can increase the speed and precision of generated responses.
The model has been trained on significantly less hardware than other vendors like META have used. DeepSeek also had to program 20 of the 132 H800 GPUs (a limited edition of the NVIDIA H100 due to US sanctions) specifically to handle communication between the chips. To make this work, DeepSeek’s engineers had to go down to PTX, a low-level instruction set for Nvidia GPUs that essentially functions as assembly.
Even though NVIDIA’s stock dropped significantly, it’s important to know that almost all AI training and inference (the generation of output) rely on NVIDIA’s software architecture, CUDA. This means NVIDIA still holds a dominant position in the market going forward, although new solutions, including from AMD, are emerging.
When ChatGPT was launched in November 2022, it cost $120 to generate about 750,000 words. With GPT-4o, the cost was reduced to about $2.5, representing a reduction of about 98%.
Now, with DeepSeek-R1, we’re down to an even lower cost, with a 99.9% reduction compared to ChatGPT. This means we’re seeing continuous improvements in how models are trained and used at a lower cost.
Of course, it’s also important to be aware that this model has filters that block out information it doesn’t want to be revealed. It’s still in the early stages, so we don’t fully know how it interprets all types of information.
Like other models, this one also has a set of security filters to prevent misuse, such as answering questions like “how to hack a website” or “how to make a bomb.”