- ~$25 million is the cost to train a 70 billion parameter model on 1.4 trillion tokens.
- The cost ratio of fine-tuning vs training from scratch (pre-training) is less than 0.001.
- The cost ratio of GPT-4 to GPT -3.5 Turbo is approximately 25.
- On average, there are 1.3 tokens per word.
- The average human reading speed is between 3 – 5 words per second.
- GPU Memory Capacity for NVIDIA GPUs are: V100 has 16GB, A10G has 24GB, A100 has 40/80GB, L40 has 48GB.
- The typical GPU memory requirements of an LLM for serving are twice the number of parameters. (For instance, a 7 Billion parameter model requires approx. 14 GB of VRAM in total.
- ChatGPT (GPT 3.5) Can issue about 40 tokens per second, GPT 4 can issue about 15 tokens per second.
- A LLM with 13 Billion parameters using llama.cpp on a high-end modern CPU can issue about 10 tokens per second
- A LLM with 13 Billion parameters using a newer GPU with CUDA can issue about 85 tokens per second.
- LLaMa2 with 70 Billion parameters used about 1720320 hours of GPU hours to do the pre-training
- In LLaMa2 close to 90% of the dataset used to do pretraining is English, the remaining parts is programming languages and other languages.
- A LLM with 70 Billion parameters using a newer GPU with CUDA can issue about 15 tokens per second
- ChatGPT costs a whopping $700,000/day to operate.
- ChatGPT has over 180 million users.
- GPT-3 (which preceded GPT-4) had 175 billion parameters.
- This was a significant increase from GPT-2, which had 1.5 billion parameters.
- Training GPT-3 was estimated to cost tens of millions of dollars in computing resources.
- GPT models have a “knowledge cutoff” date.
- GPT 3-5 goes up until September 2021.
- For instance, GPT-4 training data goes up to January 2022.
- For the GPT-3 model, a typical forward pass (generating a response) takes about 3.14 milliseconds on an NVIDIA V100 GPU.
- The main parts to get LLMs to run locally are VRAM Size, memory bandwidth, and FP32 Performance.
- Processing the entire “Lord of the Rings” page by page to make a summary (576,458 words) would cost about $90 using GPT-4.
- If we process the Lord of the Ring books over 75 times using GPT-4, we could afford an NVIDIA A100.
- Claude2 could fit the entire book into the context window.
- Claude2 supports 100,000 tokens and would cost about $40.