Microsoft acquires Semantic Machines to make spoken AI more human-like
While Google has been working on life-like version of its spoken Artificial Intelligence (AI) technology, Microsoft has jumped on the bandwagon by acquiring US-based AI developer “Semantic Machines” to develop the technology closer to how humans speak.
“With this acquisition, Microsoft plans to establish a conversational AI centre of excellence in Berkeley, California, to experiment and integrate ‘natural language processing (NLP) technology’ in its products like Cortana,” David Ku, Vice President and Chief Technology Officer of AI and Research at Microsoft, wrote in a blog post.
“For rich and effective communication, intelligent assistants need to be able to have a natural dialogue instead of just responding to commands,” said Ku.
Recently, Microsoft became the first to add “full-duplex” voice sense to a conversational AI system for users to carry on a conversation naturally with its chatbot XiaoIce and AI-powered assistant Cortana.
“Full-duplex” is a technique to communicate in both directions simultaneously mostly like a telephone call with AI-based technology conversing on one side.
A “Semantic Machines” core product, its “conversation engine” extracts its responses from natural voice or text input and then generates a self-updating learning framework for managing dialog context and user goals.
“Today’s commercial natural language systems like Siri, Cortana, and Google Now only understand commands, not conversations,” said “Semantic Machines” in a post.
“With our conversational AI, we aim to develop technology that goes beyond understanding commands, to understanding conversations,” the company added.
Earlier in May, Sundar Pichai, CEO, Google, introduced Duplex at Google I/O and demonstrated how AI system could book an appointment at a salon and a table at a restaurant where the Google Assistant sounded like a human.
It used Google DeepMind’s new “WaveNet” audio-generation technique and other advances in Natural Language Processing (NLP) to replicate human speech patterns.