Consumer GPU hits 95% SimpleQA: local AI deep research is becoming reality

A year ago, "running deep research locally" sounded like science fiction. You needed cloud models, paid APIs, and had to accept response latency and privacy concerns.

Now, one RTX 3090, one 27B parameter open-source model, and you're hitting ~95% on SimpleQA.

The local-deep-research project quietly grew to 7,098 stars on GitHub, adding 2,483 this week. 6,415 commits, 440 branches, 155 tags — this isn't a toy project, it's a seriously maintained tool.

What it does

In one sentence: give it a question, it works like a researcher.

Searches across multiple engines (10+ sources including arXiv, PubMed)
Reads, filters, and cross-validates information
Synthesizes into a cited research report
Everything runs locally, data stored encrypted

Compared to cloud-based deep research products, the core difference is one: data never leaves your machine.

Qwen3.6-27B running on a 3090 — that signal matters on its own

A 27B parameter model, 4-bit quantized, is about 15GB VRAM. The RTX 3090 has 24GB — it fits, and it's not scraping the bottom. What does this mean?

Two years ago, this level of inference required an A100. One year ago, a 4090. Now, a used 3090 does it.

This isn't linear progress. It's a跳水 on the cost curve.

How to understand the 95% SimpleQA number

SimpleQA is OpenAI's QA benchmark, testing "can the model give concise, accurate factual answers." 95% is high, but note a few things:

This is a community-reported number, not from an official benchmark run. The README says "~95%" — that tilde matters
SimpleQA tests factual QA, not reasoning, not writing, not code
High score ≠ usable in all research scenarios

Even so, 95% on SimpleQA means: for most fact-checking tasks, local models are already sufficient.

Use cases

Academic paper research: arXiv and PubMed integration, search papers directly, generate summaries
Technology selection research: Compare options, produce analysis reports
Privacy-sensitive research: Medical data, internal documents, trade secrets — data stays local

Not suitable for

Ultra-large-scale knowledge retrieval (cloud model parameter counts still crush local)
Deep reasoning chain requirements (27B reasoning capability still trails 400B+)
Real-time web search for latest events (local models have training data cutoff dates)

A turning point for local AI research

local-deep-research isn't the only local research tool, but it might be the most mature one right now. 6,415 commits, 186 open PRs, 79 open issues — these numbers show a community seriously contributing.

When a consumer GPU can deliver deep research capability approaching cloud models, one more reason to "must use cloud" disappears.

Sources:

LearningCircuit/local-deep-research — Official repository
GitHub Trending weekly

What it does

Qwen3.6-27B running on a 3090 — that signal matters on its own

How to understand the 95% SimpleQA number

Use cases

Not suitable for

A turning point for local AI research

Related

Presenton Is Not "Just Another AI PPT": It Turns Presentations into a Deployable Generation Workflow

The Real Appeal of Midscene: UI Automation Can Finally Ditch Fragile Selectors

A New Closed Loop for Frontend Debugging: Chrome DevTools MCP Reduces Guesswork for Coding Agents