What is Llama 4 Maverick?
Llama 4 Maverick is Meta's flagship open-weight multimodal model, released April 5, 2025. It's the highest-performance model in Meta's Llama 4 family, designed for enterprise-grade applications requiring strong multimodal reasoning, coding, and multilingual support. With 17 billion active parameters from a 400 billion total MoE architecture (128 experts), it delivers frontier-class performance at open-source pricing — approximately $0.22 per million input tokens and $0.85 per million output tokens via API providers.
Llama 4 Maverick Performance
At launch, Maverick outperformed GPT-4o and Gemini 2.0 Flash across a broad range of benchmarks including vision, coding (LiveCodeBench), and knowledge tasks (MMLU Pro, GPQA Diamond). Meta used Llama 4 Behemoth — a massive 2 trillion parameter teacher model — to distill knowledge into Maverick, giving it capabilities that exceed what its parameter count would suggest.
An experimental chat version of Maverick reached an ELO of 1417 on Chatbot Arena at one point — briefly competitive with top closed models. The production version scores well on vision-language tasks and multilingual benchmarks across 12 supported languages.
Llama 4 Maverick Context Window and Specs
Maverick supports a 1 million token context window and 16,384 maximum output tokens. It runs on a single H100 DGX host (8 GPUs) for self-hosted deployment, or distributed inference for larger deployments. The MoE architecture means only 17B parameters are active per inference pass, keeping latency practical despite the 400B total parameter count. Knowledge cutoff is August 2024.
Llama 4 Maverick vs Closed Models
Against GPT-5 ($1.25/$10 per million tokens), Maverick costs roughly 80% less at comparable or slightly lower capability. Against Claude Sonnet 4.5 ($3/$15), Maverick is dramatically cheaper while performing competitively on most tasks. The main advantage of closed models is reliability, support, and ecosystem maturity. For teams comfortable with open-source deployment, Maverick is one of the best value options in the frontier model tier.
Frequently Asked Questions
What's the difference between Llama 4 Scout and Maverick?
Scout has 16 experts and 10M context — optimised for long documents on single-GPU deployments. Maverick has 128 experts and 1M context — optimised for maximum reasoning quality requiring 8 GPUs. Maverick is the better general-purpose model; Scout is for long-context specialisation.
Is Llama 4 Maverick free to use commercially?
Yes, for businesses under 700 million monthly active users. Above that threshold, a special Meta license is required. EU-based companies should check Meta's license for current geographic restrictions.