Poolside Laguna XS.2: 33B-Parameter MoE Coding Model, an Agent-Level Code Agent That Runs on Your Mac

A project worth noting for agent developers has appeared on both GitHub Trending and Hugging Face leaderboards — Poolside Laguna XS.2. This isn't another "bigger parameters, better rankings" story, but a completely different technical path: fitting an agent-level coding model into a consumer-grade Mac.

Laguna XS.2: 33B Total, 3B Activated

Laguna XS.2 comes from Poolside (a company focused on AI coding assistants) and is a 33B total parameter / 3B activated parameter MoE model. With 256 experts + 1 shared expert, each inference activates only about 3B parameters.

Key metrics:

Dimension	Laguna XS.2	Compared To
SWE-bench Verified	68.2%	Surpasses Gemma 4 31B IT (52.0%)
SWE-bench Multilingual	62.4%	Surpasses Devstral Small 2 (55.7%)
SWE-bench Pro	44.5%	Surpasses Gemma 4 31B IT (35.7%)
Terminal-Bench 2.0	30.1%	Surpasses Devstral Small 2 (22.5%)

Note the benchmark — Gemma 4 31B IT is Google's flagship open-source coding model, and Devstral Small 2 is Mistral's coding-specialized model. Laguna XS.2 comprehensively outperforms both on the SWE-bench series.

Architecture Highlights: Sliding Window Attention + Interleaved Thinking

Several engineering decisions in Laguna XS.2's architecture deserve attention:

Sliding Window Attention (SWA): 30 out of 40 layers use sliding window attention (window size 512 tokens), with only 10 layers using global attention. The 3:1 ratio is achieved through sigmoid gating and per-layer rotation scaling. This means a significantly reduced KV cache — substantially lowering memory pressure in long-context scenarios.

Interleaved Thinking: The model supports "thinking" between tool calls, and this can be toggled on or off per request. This addresses a core pain point for coding agents: not every step requires deep reasoning; sometimes fast execution is more efficient than deep thinking.

Muon Optimizer: Training uses the Muon optimizer — the very same optimizer open-sourced by the Kimi team and adopted by DeepSeek V4's training pipeline. The influence of Chinese open-source technology is once again confirmed.

FP8 KV Cache: KV cache quantized to FP8, further reducing memory footprint.

Local Deployment: One Mac Is All You Need

This is Laguna XS.2's biggest selling point. 33B total parameters sounds substantial, but because only 3B are activated at a time, combined with the sparsity of the MoE architecture, a 36GB RAM Mac (M2/M3 Pro) can run it.

# Ollama one-click deployment
ollama run poolside/laguna-xs2

Available on Ollama means:

No GPU clusters needed, no cloud service costs
Code data stays local, privacy guaranteed
Works offline, functional even without internet

For an agent-oriented coding model, local deployment means you can integrate Laguna XS.2 into frameworks like Claude Code, OpenClaw, and Hermes Agent as a local code generation backend.

Training Pipeline: Data Automixing + Async Off-Policy Agent RL

Poolside revealed training details in their release blog:

Pre-training Phase: Mixed corpus of code and natural language
Post-training Phase: Instruction tuning and preference optimization
Reinforcement Learning Phase: Async off-policy agent RL

The third step is particularly noteworthy. Agent RL performs reinforcement learning directly on the agent workflow, rather than doing SFT on static datasets. This means the model "learned" during training how to properly use tools, how to plan multi-step tasks, and how to think between tool calls.

Data automixing is another highlight — no need for manual annotation of data ratios; the model automatically learns the optimal mixing strategy from different data sources.

Comparison with Chinese Models

Placing Laguna XS.2 within the current landscape of Chinese coding models:

Model	Activated Parameters	SWE-bench Verified	Deployment
Laguna XS.2	3B	68.2%	Local Mac
Qwen3.6-35B-A3B	3B	~65%	Local / Cloud
DeepSeek V4 Flash	18B	~60%	Primarily Cloud
Kimi K2.6	~50B	~70%	Primarily Cloud

Laguna XS.2 is close to Qwen3.6-35B-A3B on SWE-bench Verified, but the latter has advantages in Chinese-language scenarios and multimodal capabilities. Kimi K2.6 scores highest but requires cloud deployment.

Differentiated Positioning: Laguna XS.2's advantage isn't the highest absolute score, but rather the highest score among locally deployable coding models. If you need data to stay local or lack cloud API budget, this is currently the best option.

Three Judgments

Signal: The combination of 33B/3B MoE + SWA + interleaved thinking is truly first-class among local coding models. The 68.2% SWE-bench Verified score has no rival in this parameter range. Apache 2.0 licensing means unrestricted commercial use.

Increment: Laguna XS.2 is not a simple iteration of Laguna XS.1. Async off-policy agent RL training, interleaved thinking mechanism, FP8 KV cache — these are not minor tweaks but represent an evolution in coding model training methodology.

Noise: The 30.1% score on Terminal-Bench 2.0 isn't high, indicating room for improvement in terminal operation scenarios. The model is optimized for English; Chinese support needs verification. The community ecosystem is still young, and toolchain maturity lags behind Qwen or DeepSeek.

How to Use

# Option 1: Ollama (Recommended)
ollama run poolside/laguna-xs2

# Option 2: vLLM
pip install vllm
python -m vllm.entrypoints.openai.api_server \
    --model poolside/Laguna-XS.2 \
    --tensor-parallel-size 1

# Option 3: Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "poolside/Laguna-XS.2",
    device_map="auto",
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("poolside/Laguna-XS.2")

Suitable for: Local code agent backend, privacy-sensitive code review, offline coding assistance, AI coding education in classroom environments.

Not suitable for: Deep Chinese language optimization, multimodal understanding, ultra-large-scale concurrent serving.

Summary

Laguna XS.2 represents a clear trend: coding models are moving from "cloud-based large models" to "local intelligent agents". When a 36GB Mac can run an agent-level coding model scoring 68% on SWE-bench, developers need to rethink the deployment architecture of "AI coding assistants."

It won't replace Qwen or DeepSeek — but it gives developers who need local deployment, data privacy, and offline capabilities a genuinely usable option. On the open-source coding model map, Laguna XS.2 fills the "local high-performance" gap.

Source: poolside/Laguna-XS.2 | Poolside Release Blog

Laguna XS.2: 33B Total, 3B Activated

Architecture Highlights: Sliding Window Attention + Interleaved Thinking

Local Deployment: One Mac Is All You Need

Training Pipeline: Data Automixing + Async Off-Policy Agent RL

Comparison with Chinese Models

Three Judgments

How to Use

Summary

Related

9Router: Route Claude Code, Cursor, Codex to 40+ Free Model Sources, RTK Saves 40% Tokens, Auto-Fallback Never Stops

AiToEarn: An Open Source Framework for Making Money with AI, But Don't Be Fooled by the Name

bolt.diy: Open Source Bolt.new, Bringing AI Full-Stack Dev from Cloud to Local