Jump to content
Sign in to follow this  
The AchieVer

Move over Siri, Alexa: Google's offline voice recognition breakthrough cuts response lag

Recommended Posts

The AchieVer

Move over Siri, Alexa: Google's offline voice recognition breakthrough cuts response lag

Google paves the way for Siri and Alexa working without the internet.




If you're one of the few people who own a Google Pixel phone, you'll soon be able to experience voice recognition without the internet. 


Google has announced the rollout of "an end-to-end, all-neural, on-device speech recognizer to power speech input in Gboard", the company's keyboard with Google Search baked in. 


The technology could give Google an edge over Siri and Alexa in convincing people to talk to machines through phones and home speakers that can deliver answers faster, by cutting down the latency that comes with sending a request from a device to a remote server and waiting for a response.  


The company has enabled on-device voice recognition by miniaturizing a machine-learning model that can do the task on a phone rather than handing off the job to a server in the cloud. 


Google researchers detailed the on-device technique in a paper published on arXiv.org in November called 'Streaming End-to-end Speech Recognition For Mobile Devices'. 


According to Google researchers, the model works at the character level, so as the user enunciates a word, the machine repeats it one character at a time, exactly how an expert human transcriber would type. 


Beyond supreme low-latency speech recognition, Google wanted its system to exploit "on-device user context", such as the user's list of contacts, music apps to provide a list of song names they might be referring to, and location. 


To achieve the on-device intelligence, Google employed a Recurrent Neural Networks (RNN) transducer aided by a recent innovation called 'Connectionist temporal classification' that's used for training neural networks. The technique allowed for a more efficient manner for machines to interpret speech. 


Google explains that the speech-recognition engine would normally depend on a search graph that can be 2GB in size, which would be onerous if stored on a device. 


Instead, it trained a neural network that provides the same accuracy as a client-server setup that was just 450MB in size. Not happy with that, the Google researchers shrunk the model to just 80MB. 


"Our new all-neural, on-device Gboard speech recognizer is initially being launched to all Pixel phones in American English only," Google researchers said

"Given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application."


Google compares server-side speech recognizer, left, with the on-device recognizer, right, when recognizing the same spoken sentence.





Share this post

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Create New...