Some interesting numbers about GPT and LLMs

~$25 million is the cost to train a 70 billion parameter model on 1.4 trillion tokens.
The cost ratio of fine-tuning vs training from scratch (pre-training) is less than 0.001.
The cost ratio of GPT-4 to GPT -3.5 Turbo is approximately 25.
On average, there are 1.3 tokens per word.
The average human reading speed is between 3 – 5 words per second.
GPU Memory Capacity for NVIDIA GPUs are: V100 has 16GB, A10G has 24GB, A100 has 40/80GB, L40 has 48GB.
The typical GPU memory requirements of an LLM for serving are twice the number of parameters. (For instance, a 7 Billion parameter model requires approx. 14 GB of VRAM in total.
ChatGPT (GPT 3.5) Can issue about 40 tokens per second, GPT 4 can issue about 15 tokens per second.
A LLM with 13 Billion parameters using llama.cpp on a high-end modern CPU can issue about 10 tokens per second
A LLM with 13 Billion parameters using a newer GPU with CUDA can issue about 85 tokens per second.
LLaMa2 with 70 Billion parameters used about 1720320 hours of GPU hours to do the pre-training
In LLaMa2 close to 90% of the dataset used to do pretraining is English, the remaining parts is programming languages and other languages.
A LLM with 70 Billion parameters using a newer GPU with CUDA can issue about 15 tokens per second
ChatGPT costs a whopping $700,000/day to operate.
ChatGPT has over 180 million users.
GPT-3 (which preceded GPT-4) had 175 billion parameters.
- This was a significant increase from GPT-2, which had 1.5 billion parameters.
Training GPT-3 was estimated to cost tens of millions of dollars in computing resources.
GPT models have a “knowledge cutoff” date.
- GPT 3-5 goes up until September 2021.
- For instance, GPT-4 training data goes up to January 2022.
For the GPT-3 model, a typical forward pass (generating a response) takes about 3.14 milliseconds on an NVIDIA V100 GPU.
The main parts to get LLMs to run locally are VRAM Size, memory bandwidth, and FP32 Performance.
Processing the entire “Lord of the Rings” page by page to make a summary (576,458 words) would cost about $90 using GPT-4.
If we process the Lord of the Ring books over 75 times using GPT-4, we could afford an NVIDIA A100.
Claude2 could fit the entire book into the context window.
- Claude2 supports 100,000 tokens and would cost about $40.

Share this:

Leave a Reply Cancel reply