Agent Code Search Saves 98% Tokens vs grep: How Semble Does It

When an AI Agent searches for a function in a codebase, the traditional approach goes like this:

Agent sends a request: "Please read src/auth/login.py" → You get the entire file, 500 lines → Agent says "not this one, read src/auth/handler.py" → Another 800 lines → Repeat five or six times, and the Agent finally finds the 3 lines of code it needed.

You paid tokens for 5,000 lines of code, but only used 3.

This is the problem Semble exists to solve.

Semble's Approach

Semble's name comes from "semantic" + "assemble." Its core logic: instead of feeding entire files to the Agent, first use a semantic index to find relevant code snippets, then return only what the Agent actually needs.

How does it work?

Hybrid search strategy. Not pure semantic vector search, not pure keyword matching — but a combination. Keyword-based coarse filtering first, then semantic similarity-based fine ranking. This way, it won't miss exact code matches while also understanding semantic-level relationships like "this function relates to that function."

Ignore file mechanism. Like .gitignore, Semble supports .sembleignore, automatically skipping directories like node_modules, vendor, and dist that Agents don't care about.

Token counting quantification. They don't just claim "saves tokens" — they ran benchmarks: token consumption comparison between Semble and grep+read across different codebase sizes. An 858-star project daring to publish benchmark data right in the README suggests the numbers hold up.

Why This Matters

You might think 98% is a marketing number. But the trend behind it is real.

The biggest cost bottleneck for AI coding agents isn't inference speed — it's the context window. A medium-sized codebase, and an agent reading a few files can burn through tens of thousands of tokens. GPT-4o input tokens cost $2.50 per million, Claude Sonnet is $3 per million. A complex multi-file task can burn several dollars just on context reading.

When an Agent executes thousands of tasks daily, this cost is exponential.

Semble addresses exactly this "invisible cost leak."

Comparison with Existing Solutions

This space has no shortage of players:

grep/ripgrep: Fast, but returns entire matching lines, imprecise context
Sourcegraph: Enterprise-grade, powerful but heavy
GitHub Code Search: Platform-bound
Various vector database solutions: Pure semantic search, precision problems

Semble's positioning is clear: lightweight code search optimized for AI Agents. It's not trying to replace Sourcegraph, nor compete with ripgrep on speed. Its sole goal: let Agents find needed code at the lowest possible token cost.

Points to Watch

This project is still young — 76 commits, 6 issues, latest commit 8 hours ago. But author stephantul is an active contributor with a high update frequency.

More noteworthy is its benchmark directory: they didn't just do token consumption comparison, but also performance testing across different codebase scales. This "let the data speak" approach is uncommon in the open-source community.

My Take

Semble represents an increasingly clear trend: the AI Agent tool ecosystem is moving from "usable" to "efficiently usable."

Early Agent tools just needed to get the job done. Now everyone is optimizing: fewer tokens, lower latency, higher accuracy. Semble is a typical case in this optimization wave.

If your team uses AI coding agents on large codebases, it's worth spending 15 minutes trying Semble. The tokens saved might be far more than you'd intuitively expect.

Primary sources:

MinishLab/semble on GitHub — 858 stars, 67 forks, 76 commits
Project benchmarks directory: token efficiency comparison data
Show HN post: 12 points, 12 comments

Semble's Approach

Why This Matters

Comparison with Existing Solutions

Points to Watch

My Take

Related

Presenton Is Not "Just Another AI PPT": It Turns Presentations into a Deployable Generation Workflow

The Real Appeal of Midscene: UI Automation Can Finally Ditch Fragile Selectors

A New Closed Loop for Frontend Debugging: Chrome DevTools MCP Reduces Guesswork for Coding Agents