Skip to main content

Llama 4 Maverick

LLM(MoE) 🔥 Trending
Visit Website

Meta's flagship multimodal model powering Facebook, Instagram, and WhatsApp.

Developer

Meta

Release Date

April 05, 2025

Pricing

Free

Key Features

Multimodal (text+image)
1M Token Context
17B Active Parameters
128 Experts
400B Total Parameters
Vision Understanding

Use Cases

Multilingual Chat

Perfect for multilingual chat applications

Generative Content

Perfect for generative content applications

Enterprise Applications

Perfect for enterprise applications applications

Social Media AI

Perfect for social media ai applications

General Purpose

Perfect for general purpose applications

What is Llama 4 Maverick?

Llama 4 Maverick is Meta's flagship open-weight multimodal model, released April 5, 2025. It's the highest-performance model in Meta's Llama 4 family, designed for enterprise-grade applications requiring strong multimodal reasoning, coding, and multilingual support. With 17 billion active parameters from a 400 billion total MoE architecture (128 experts), it delivers frontier-class performance at open-source pricing — approximately $0.22 per million input tokens and $0.85 per million output tokens via API providers.

Llama 4 Maverick Performance

At launch, Maverick outperformed GPT-4o and Gemini 2.0 Flash across a broad range of benchmarks including vision, coding (LiveCodeBench), and knowledge tasks (MMLU Pro, GPQA Diamond). Meta used Llama 4 Behemoth — a massive 2 trillion parameter teacher model — to distill knowledge into Maverick, giving it capabilities that exceed what its parameter count would suggest.

An experimental chat version of Maverick reached an ELO of 1417 on Chatbot Arena at one point — briefly competitive with top closed models. The production version scores well on vision-language tasks and multilingual benchmarks across 12 supported languages.

Llama 4 Maverick Context Window and Specs

Maverick supports a 1 million token context window and 16,384 maximum output tokens. It runs on a single H100 DGX host (8 GPUs) for self-hosted deployment, or distributed inference for larger deployments. The MoE architecture means only 17B parameters are active per inference pass, keeping latency practical despite the 400B total parameter count. Knowledge cutoff is August 2024.

Llama 4 Maverick vs Closed Models

Against GPT-5 ($1.25/$10 per million tokens), Maverick costs roughly 80% less at comparable or slightly lower capability. Against Claude Sonnet 4.5 ($3/$15), Maverick is dramatically cheaper while performing competitively on most tasks. The main advantage of closed models is reliability, support, and ecosystem maturity. For teams comfortable with open-source deployment, Maverick is one of the best value options in the frontier model tier.

Frequently Asked Questions

What's the difference between Llama 4 Scout and Maverick?

Scout has 16 experts and 10M context — optimised for long documents on single-GPU deployments. Maverick has 128 experts and 1M context — optimised for maximum reasoning quality requiring 8 GPUs. Maverick is the better general-purpose model; Scout is for long-context specialisation.

Is Llama 4 Maverick free to use commercially?

Yes, for businesses under 700 million monthly active users. Above that threshold, a special Meta license is required. EU-based companies should check Meta's license for current geographic restrictions.

API Available

Integrate Llama 4 Maverick into your applications

Similar AI Models

Llama 4 Scout

Meta's lightweight multimodal model with 10M token context window....

Llama 4 Behemoth

Meta's unreleased ultra-large model with 2T parameters (still in training, not yet released)....