Meta Open-Sources Llama 4 Scout: 17B/109B MoE Architecture, 10M Token Context for Just $0.08

Meta Open-Sources Llama 4 Scout: 17B/109B MoE Architecture, 10M Token Context for Just $0.08

Key Takeaway

Meta released Llama 4 Scout in late April — a 17B active / 109B total parameter MoE (Mixture of Experts) model with 16 expert routes.

  • 10M token context window: Process a 300-page document without chunking
  • $0.08/M token input price: Use OpenAI-compatible API via aggregators
  • Open weights: The last open Meta model tier before Muse Spark goes closed-source

Architecture

MoE Design

ParameterValueMeaning
Total params109BTotal knowledge capacity
Active params17BActually used per inference
Experts16Routable sub-networks

The MoE advantage: 109B parameters of knowledge at the inference cost of a 17B model.

Price Comparison

ModelInput Price ($/M tokens)ContextOpen
Llama 4 Scout$0.0810MYes
GPT-5.5$2.501MNo
Claude Opus 4.7$15.00200KNo
DeepSeek-V4-Flash$0.141MYes

Best Use Cases

  1. Long document QA — legal, financial, medical documents without chunking
  2. Codebase understanding — feed entire codebase at once
  3. Batch document processing — summaries, classification, extraction
  4. Cost-sensitive inference — high volume, acceptable absolute accuracy

Summary

Llama 4 Scout isn’t the strongest model, but it’s the most practical — solving a real pain point (document chunking) at extremely low cost. With Meta’s open-source strategy potentially tightening, now is the time to use it.


Meta’s open-source strategy is shifting from “open everything” to “tiered openness.” Llama 4 Scout may be the last train.