Most AI voice conversation systems layer speech-to-text and text-to-speech on top of a language model. Thinking Machines chose a different path: train interactivity directly into the model.
The claim: when interactivity is native, scaling the model makes it both smarter and a better collaborator simultaneously — one goal, not two separate optimization targets.
Main sources: