Key Conclusion
On April 29, Anthropic open-sourced BioMysteryBench on Hugging Face — a new bioinformatics evaluation benchmark. The benchmark contains 99 open-ended questions based on real research data, covering DNA/RNA sequencing, proteomics, metabolomics, and more. Of these, 23 questions could not be answered even by domain experts.
Claude Mythos solved approximately 30% of these “impossible” questions, and most of the rest. This marks a landmark breakthrough for AI in scientific research.
BioMysteryBench Design Logic
The biggest difference from traditional benchmarks: BioMysteryBench answers can be verified from the objective properties of the data itself, not by matching what the question author chose as the analysis method.
| Dimension | Traditional Benchmark | BioMysteryBench |
|---|---|---|
| Question Source | Artificially constructed / historical datasets | Real research datasets |
| Answer Verification | Compared to standard answers | Verifiable from data’s objective properties |
| Question Type | Closed-ended | Open-ended research questions |
| Expert Involvement | Not involved in question design | Domain experts design questions and annotate human solvability |
| Data Coverage | Single modality | DNA/RNA sequencing, proteomics, metabolomics |
What 30% Means
Of the 23 “expert-stumping” questions, Mythos solved about 7. This needs to be understood in context:
- These questions were designed by domain experts based on real, unsolved research problems
- Answers are not “known facts” but patterns and correlations that need to be discovered from complex data
- A 30% solve rate in scientific research is unprecedented — equivalent to an AI assistant independently advancing about one-third of unsolved problems in a field
As Anthropic puts it: “AI is no longer just assisting biologists. It’s starting to out-think them.”
Action Recommendations
- Bioinformatics researchers: BioMysteryBench is open-sourced on Hugging Face — you can test Claude’s analytical capabilities with your own data
- AI application developers: This is a new vertical opportunity — encapsulating Mythos’s biological reasoning into research assistant tools
- Investors: Anthropic’s acceleration in scientific AI aligns with their CEO’s earlier prediction that “Claude could do most of our work in 6-12 months”