Anthropic Releases BioMysteryBench: Claude Mythos Solves 30% of Biology Problems That Stumped Human Experts

Anthropic Releases BioMysteryBench: Claude Mythos Solves 30% of Biology Problems That Stumped Human Experts

Key Conclusion

On April 29, Anthropic open-sourced BioMysteryBench on Hugging Face — a new bioinformatics evaluation benchmark. The benchmark contains 99 open-ended questions based on real research data, covering DNA/RNA sequencing, proteomics, metabolomics, and more. Of these, 23 questions could not be answered even by domain experts.

Claude Mythos solved approximately 30% of these “impossible” questions, and most of the rest. This marks a landmark breakthrough for AI in scientific research.

BioMysteryBench Design Logic

The biggest difference from traditional benchmarks: BioMysteryBench answers can be verified from the objective properties of the data itself, not by matching what the question author chose as the analysis method.

DimensionTraditional BenchmarkBioMysteryBench
Question SourceArtificially constructed / historical datasetsReal research datasets
Answer VerificationCompared to standard answersVerifiable from data’s objective properties
Question TypeClosed-endedOpen-ended research questions
Expert InvolvementNot involved in question designDomain experts design questions and annotate human solvability
Data CoverageSingle modalityDNA/RNA sequencing, proteomics, metabolomics

What 30% Means

Of the 23 “expert-stumping” questions, Mythos solved about 7. This needs to be understood in context:

  • These questions were designed by domain experts based on real, unsolved research problems
  • Answers are not “known facts” but patterns and correlations that need to be discovered from complex data
  • A 30% solve rate in scientific research is unprecedented — equivalent to an AI assistant independently advancing about one-third of unsolved problems in a field

As Anthropic puts it: “AI is no longer just assisting biologists. It’s starting to out-think them.”

Action Recommendations

  • Bioinformatics researchers: BioMysteryBench is open-sourced on Hugging Face — you can test Claude’s analytical capabilities with your own data
  • AI application developers: This is a new vertical opportunity — encapsulating Mythos’s biological reasoning into research assistant tools
  • Investors: Anthropic’s acceleration in scientific AI aligns with their CEO’s earlier prediction that “Claude could do most of our work in 6-12 months”