GPT-4 represents a major leap forward in large language model capabilities. Developed by OpenAI, it builds on the architecture and strengths of GPT-3 while achieving new levels of scale and performance.

This article summarizes the key details about GPT-4 based on currently available public information.

Model Stats

Model Card

The GPT-4 model card provides transparency into the model's training data, intended uses, capabilities, limitations and more.

Model Architecture

GPT-4 utilizes a mixture of experts (MoE) architecture with separate expert neural networks that specialize in certain tasks or data types.

This allows the overall model to scale up while keeping inference costs practical. The specialized experts can also develop unique capabilities.

Training Process

Training a 1.8 trillion parameter model required extensive computational resources:

Various parallelism techniques enabled this scale:

Inference Serving

Deploying GPT-4 also requires specialized infrastructure:

Dense inference clusters keep query costs affordable at scale.

Token Dropping

The MoE routing mechanism can lead to token dropping, where some tokens are unprocessed due to expert capacity limits.

Future Directions

While impressive, GPT-4 remains focused on text. Future areas of research include:

GPT-4 demonstrates the rapid pace of progress in language models. While we are still far from general intelligence, OpenAI continues pushing towards this goal with each new iteration. Exciting capabilities likely lie ahead.