Anthropic Releases BioMysteryBench: Claude Mythos Solves 30% of Biology Problems That Stumped Human Experts

Key Conclusion

On April 29, Anthropic open-sourced BioMysteryBench on Hugging Face — a new bioinformatics evaluation benchmark. The benchmark contains 99 open-ended questions based on real research data, covering DNA/RNA sequencing, proteomics, metabolomics, and more. Of these, 23 questions could not be answered even by domain experts.

Claude Mythos solved approximately 30% of these “impossible” questions, and most of the rest. This marks a landmark breakthrough for AI in scientific research.

BioMysteryBench Design Logic

The biggest difference from traditional benchmarks: BioMysteryBench answers can be verified from the objective properties of the data itself, not by matching what the question author chose as the analysis method.

Dimension	Traditional Benchmark	BioMysteryBench
Question Source	Artificially constructed / historical datasets	Real research datasets
Answer Verification	Compared to standard answers	Verifiable from data’s objective properties
Question Type	Closed-ended	Open-ended research questions
Expert Involvement	Not involved in question design	Domain experts design questions and annotate human solvability
Data Coverage	Single modality	DNA/RNA sequencing, proteomics, metabolomics

What 30% Means

Of the 23 “expert-stumping” questions, Mythos solved about 7. This needs to be understood in context:

These questions were designed by domain experts based on real, unsolved research problems
Answers are not “known facts” but patterns and correlations that need to be discovered from complex data
A 30% solve rate in scientific research is unprecedented — equivalent to an AI assistant independently advancing about one-third of unsolved problems in a field

As Anthropic puts it: “AI is no longer just assisting biologists. It’s starting to out-think them.”

Action Recommendations

Bioinformatics researchers: BioMysteryBench is open-sourced on Hugging Face — you can test Claude’s analytical capabilities with your own data
AI application developers: This is a new vertical opportunity — encapsulating Mythos’s biological reasoning into research assistant tools
Investors: Anthropic’s acceleration in scientific AI aligns with their CEO’s earlier prediction that “Claude could do most of our work in 6-12 months”

Key Conclusion

BioMysteryBench Design Logic

What 30% Means

Action Recommendations

Related

MiniMax M2.7 Deep Dive: The Model That Trains Itself

DeepSeek V4 Pro API 75% Off, Unlocks 1M Context in Claude Code / OpenClaw

Moonshot AI Announces Kimi K3: 2.5 Trillion Parameters, Targeting Global Top-Tier Models