GPT-4: Pushing the Boundaries of AI Language Models

GPT-4 had a goal to improve ability to understand and generate natural language text in complex and nuanced scenarios. To test its capabilities, GPT-4 was evaluated on various exams originally designed for humans. The results showed that GPT-4 often outperforms most human test takers, achieving scores that fall in the top 10% in simulated bar exams. It also outperforms previous large language models and most state-of-the-art systems in traditional NLP benchmarks. GPT-4 also demonstrated impressive performance in multiple-choice questions covering fifty-seven subjects in English and other languages.

Starting today, GPT-4 has been launched, and it is exclusively available via the WEB UI for users with a ChatGPT Plus subscription by accessing a specific URL https://chat.openai.com/chat?model=gpt-4. To access the GPT-4 API, which uses the same ChatCompletions API as gpt-3.5-turbo, sign up for the waitlist. The API waiting list signup can be found at this link: https://openai.com/waitlist/gpt-4-api.

Once granted access, users can make text-only requests to the gpt-4 model (image inputs remain in limited alpha). The recommended stable model will be automatically updated, but users can pin the current version (gpt-4-0314) until June 14. Pricing is $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens, with default rate limits of 40k tokens per minute and 200 requests per minute.

GPT-4 has an 8,192-token context length. Limited access to the 32,768-context (about 50 pages of text) version, gpt-4-32k, is also available. The current version (gpt-4-32k-0314) is supported until June 14. Pricing for this version is $0.06 per 1K prompt tokens and $0.12 per 1k completion tokens.

GPT-4’s performance in numerous languages surpasses that of previous English-focused models on the MMLU task. If you want the more technical details about the GPT-4 model, there is a research paper here –> https://cdn.openai.com/papers/gpt-4.pdf

GPT-4 considerably improves user intent comprehension compared to earlier models. Analyzing a dataset of 5,214 prompts from ChatGPT and the OpenAI API, users favored GPT-4-generated responses over those created by GPT-3.5 in 70% of instances. OpenAI gathered user prompts submitted via ChatGPT and the OpenAI API. These prompts and responses were sent to human labelers, who were asked to assess if the response matched the user’s intent based on the prompt.

GPT-4 supports prompts that include both images and text, allowing users to define any vision or language task. The model can produce text outputs based on inputs that feature text and images in any combination. Across various domains, such as documents containing text and photos, diagrams, or screenshots, GPT-4 demonstrates capabilities comparable to those seen in text-only inputs.

GPT-4 generally has limited knowledge of events occurring after September 2021, as its pre-training data mostly ends at this point. The model does not learn from its interactions and may occasionally make basic reasoning mistakes that seem inconsistent with its proficiency in various domains. Additionally, it can be overly credulous, accepting blatantly false statements from users.

Share this: