C
ChaoBro

Llama 4 Scout: Meta's Last Open-Weight MoE, 10M Token Context at Just $0.08/M Input

Llama 4 Scout: Meta's Last Open-Weight MoE, 10M Token Context at Just $0.08/M Input

Bottom Line

Meta has released Llama 4 Scout — a 17B active / 109B total parameter 16-expert MoE model with 10M token context window and input pricing at $0.08/M tokens. This is Meta's last open-weight model before Muse Spark goes closed-source, meaning if you miss Scout, the next open-weight Meta model may be a long wait.

What Happened

Llama 4 Scout Core Specs

Dimension Specs
Architecture 16-expert MoE
Total Parameters 109B
Active Parameters 17B
Context Window 10M Tokens
Input Price $0.08/M Tokens
Open Weights ✅ (last open generation)
API Compatible OpenAI-compatible format

Key Features

10M Token Context:

  • Fit a 300-page document without chunking
  • 78x the capacity of GPT-5.5's 128K context
  • Game-changing for RAG, legal document analysis, codebase understanding

Extremely Low Input Price:

  • $0.08/M input, an order of magnitude cheaper than most competitors
  • 187-375x cheaper than GPT-5.5 input
  • For large-context tasks (document analysis, code review), cost advantage is significant

Last Open Weights:

  • Meta Muse Spark has shifted to closed-source
  • Scout may be the last downloadable, fine-tunable, deployable Meta open-weight model for a while

Why It Matters

1. Price War in Long Context

Model Context Input Price ($/M) Architecture
Llama 4 Scout 10M $0.08 16-expert MoE
GPT-5.5 128K $15-30 Dense
Claude Opus 4.7 200K $15 Dense
Gemini 3.1 Pro 1M $3.50 MoE
DeepSeek V4 1M $0.14-0.55 MoE

Scout's input price is 187-375x cheaper than GPT-5.5, with 78x the context window.

Actionable Advice

Who Should Pay Attention

  • Long document processing: Legal, finance, academic document analysis
  • Codebase understanding: Feed entire projects without chunking
  • Cost control teams: Large-scale text processing on limited budget
  • Open model dependents: Need open weights for fine-tuning or private deployment

How to Get Started

# Via aggregator API (OpenAI-compatible)
curl https://api.together.ai/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "meta-llama/Llama-4-Scout",
    "messages": [{"role": "user", "content": "Analyze this 200-page contract..."}]
  }'
  • Hugging Face: huggingface.co/meta-llama
  • Aggregators: Together AI, Groq, OpenRouter