(Reading time: 2 - 4 minutes)

Ever wonder how Voice Technology works?  When done well, Voice Technology makes it sound like you're talking with another intelligent being.  This short video explains the steps that happen when you talk to a voice-enabled device like a Smart Speaker or SmartPhone.

 

 

 

Voice Technology: Title Slide

Hi, I'm Chip Edwards from Create My Voice.  In this video, I'd like to walk thru a simple diagram to show how Voice Technology Works.  You can connect with me on Twitter at ChipEdwards4 and at CreateMyVoice.

 

Voice is for more than just Smart Speakers

The most common voice devices are currently Smart Speakers and Smart Phones.  Let's walk through what happens each time you talk with one of these devices.  Let's use a Google Home as an example.

 

Google Home

Currently, when you speak to one of these devices, you need to let them know you are talking to them.  Google Assistant listens for "OK Google", or "Hey Google".  This lets the device know that you are talking to it.

 

Google Home with a Voice Mic icon

As soon as it hears the wake phrase of "Hey Google", it starts recording until it thinks you are done speaking. 

Google Home Sends Audio to the Cloud 

Once the Google Home device thinks you are done speaking, it sends the audio up to the Google Cloud for processing

 

Natural Language Processing

Google starts by performing a process called Natural Language Processing (or NLP for short).  Natural Language Processing converts your Speech into words, and then divs out your intention or what you meant.

 

Turning Intentions into Actions

Once Google decides what you want, it then has to figure out how to accomplish it.  There are 4 different ways that Google can resolve your ask.

 

Answering Questions and Providing Information

If you ask a question, Google uses it’s own knowledge base to provide the best answer.  You can see this best answer by doing a search using your browser, if Google returns a ‘featured snippet’ in position zero, that is the answer that will be used.

 

Using the Word PLAY

If you indicate that you want to hear a song or podcast by using the word “Play”, Google will look in YouTube or it’s Google Podcast library and play the latest episode.

 

IoT Devices in your Home

If you reference an IoT device, Google will send your request to the device.

 

Google Action for Your Brand

The final way that Google uses to resolve a request, is with an Action.   Actions are developed by Brands to provide a tailored experience for their audience.  An Action is how a company can ensure that its clients get the Brands content and how the Brand controls the experience for their users.

 

Sending the Response Back to the User

Google can handle two types of responses from these actions, either an audio file (like an mp3) or text.  If the response is text, Google performs a text-to-speech process to turn it into audio.   The audio is then sent back to the smart speaker to play to you.  Next, let’s take a quick look at any differences with Amazon devices.

 

Amazon Alexa Wake Words

Amazon is similar to Google with a few differences:  the first difference is the wake word.  Amazon defaults to the wake word “Alexa”, but you can change it to the word “echo” or “computer”.

 

Sources of Information for Amazon

The next difference is with questions, instead of Google’s Featured Snippets, it appears that Amazon uses Bing, Wikipedia, and it’s own knowledge graph to answer questions.  While both Google and Amazon's knowledge graph is getting better, in head to head tests, Google’s knowledge base has generally shown better than Amazon or Apples.

 

Amazon Music and Podcasts

The next difference between Google Home and Amazon Alexa is the source for music and podcasts, Amazon uses Amazon Music or TuneIn as their default Podcast source.

 

Amazon Alexa Voice Applications are called Skills

The last difference is mostly semantic, while Google calls custom Voice Applications, Actions, Amazon calls custom Voice Applications, Skills.  Google may be better at answering questions, but Amazon has 10 times more Custom Branded Voice Apps than Google.  The last number I saw was that Amazon had around 100,000 Skills and Google had less than 10,000. 

 

Thank You

If you would like to know more about Branded Google Actions or Amazon Skills, check out our website at CreateMyVoice.com.  Thanks for watching, until next time, this is Chip Edwards from Create My Voice.  Feel free to connect with me on Twitter or LinkedIn.  My Twitter handle is ChipEdwards4.  My LinkedIn profile is at LinkedIn.com/in/C Edwards.