Amazon has launched a new addition to its Nova AI family—Nova Sonic, a real-time speech generation model that mimics human-like conversations. Unlike traditional text-to-speech tools, Nova Sonic processes live voice input and delivers instant voice responses, making it ideal for AI chatbots and agent-based applications.
Available via Amazon Bedrock, Nova Sonic streamlines speech recognition and generation into a single model, reducing latency and enhancing the natural flow of conversations. It supports contextual understanding, detects pauses, mumbles, and background noise, and adjusts its tone and style based on the user’s speech.
Currently limited to English, the model can recognize various accents, speech patterns, and vocal tones, and it handles up to 32,000 audio tokens with an 8-minute session cap. Amazon plans to expand language support soon.
Developers can access Nova Sonic through a bidirectional streaming API on Amazon Bedrock.