What Makes a Speaker, Smart

[NOTE: Amazon Alexa devices are capable of dynamically processing multiple voices, so if you listen to this blog post on an Amazon Alexa device, you will hear the voices change. If you are listening on a Google device, the post will be read by a single voice. This post is an example of how a blog post could be turned into a dialog. The 4 voices in the following post are: the default voice introduces and wraps up the post. Voice 2 is Diya, a female voice from India. Voice 3 is George, a male voice from the UK. And Voice 4 is Nancy, a female voice from the US reading for Pam from Create My Voice. You can hear this blog post by saying: "Alexa, Ask Create My Voice to Read Blog Post 18"]

You can listen to the Amazon Alexa version of this post here:

It's quite interesting the steps that a smart speaker goes through to pretend to be "smart". Diya from our India office will start us off with the section titled:

Traditional speakers aren't dull, they just aren't Smart

Hi, my name is Diya. Let's talk about what makes a smart speaker different than just a speaker. A traditional speaker can play the same audio that a smart speaker does, the main difference is that you have to do all of the work. You have to determine what you want to hear. You have to find the audio version whether an MP3 or podcast or radio station. You then have to load the audio into a device to play it through the speakers. This may be as simple as pushing a button on your car sound system or as complicated as searching a library and procuring the audio. Once the content is finished, you have to do that work all over again. So, let's talk about letting Smart Speakers do the work.

Letting Smart speakers do the work

books, apple, and smart speaker — Books, Apple, and Smart Speaker

With Smart Speakers, you let computers do all of the work for you. The smart speaker determines what you want, finds it, and sends it to the speaker to play. All you have to do is ask for what you want.

To discuss the details on how a smart speaker actually does that work for you, let me introduce you to George from our UK office.

Thanks, Diya! Let's get right into the details on the steps necessary to make a smart speaker appear to be "smart". When you talk to a smart speaker, the smart speaker performs 5 steps to provide you what you want.

1) Turn Audio into Words

The first step is to turn the audio of your voice into words. This step is called Natural Language Processing or NLP for short. Computers have gotten significantly better at this step in recent years especially in particularly challenging situations like, noisy environments, and understanding people with heavy accents. But getting all (or most) of the words correct, is only the first step to giving you what you want.

2) Turn Words into Intentions

The next step is to take a series of words and determine what you want. Pam from Create My Voice says it this way, "Of course what I say is what I mean!" , you can hear more of Pam's thoughts in her blog post titled "So, What Exactly is a Smart Speaker". I don't know about you, but I find that communicating with another human has its challenges, I always know what I mean, but even with clues from my body language, intonation, and environmental context, sometimes the other person misunderstands me. It's still early days for computers to collect all that information outside of our words, so I think it's impressive how well both Amazon, Google, and Apple do at turning our words into our intentions.

3) Finding the Source of the Best Answer

Step three is determining where to find the best answer. This is where we determine how good the Intention logic in step two performed. Voice results differ from visual results. If the computer can just put all of the best answers on a monitor, the intention logic in step two just has to get the correct result in the top 10 and you can visually pick out the best result. When you are having a conversation, you don't expect responses to be a list of ten options for you to choose. You expect the one right answer to your intention to be returned. So step three is determining where-in-the-world the answer to your request resides. In addition to the internal sources of information, both Google and Amazon enable other developers to provide additional capabilities. Google calls these additional capabilities "actions". Amazon calls them "skills". Once your words are understood and your intention is determined, then the single best source to answer your request is engaged.

4) Respond to the Request

Step four is to actually action the request. Whether it's turning on the lights, or finding and reading a particular blog post, or playing your favorite radio station, the Google Action or Alexa Skill performs your intention and provides a response.
As an aside, Amazon recently enabled Alexa Skills to be able to specify multiple voices in the response. If you are listening to this blog post on an Alexa enabled device, you are hearing an example of how I've used multiple voices in the response from the skill back to the device. Which brings us to the last step in making a Smart Speaker, Smart.

5) Turn Words into Audio

The final step is to turn the response provided by the Google Action or Amazon Skill back into audio to be heard by the listener. The response can take a few forms. If you ask to turn the lights on, you may just see the lights come on. If you ask for your favorite blog, you want to know if there is a new blog post, and would like it to be read to you. If you ask for a podcast or song playlist, you want to hear the audio. This final step returns the best response to your request, and preferably fast enough to make it sound like you are having a conversation with a speaker -- but not just any speaker, a smart speaker.

Back to you Diya:

Let's wrap this blog post up with a prediction

As Smart Speakers become smarter by understanding our words and intentions better, and continue to expand their capabilities to provide the right action or response, we will find ourselves relying on them and before long, we won't know how we ever lived without them.

Thank you, Diya, George, and Pam for describing what makes Smart Speakers, Smart. If you want to learn more about leveraging smart speakers with your blog, visit the CreateMyVoice.com website and see how you can get your blog on this new platform. You can turn your blog into a podcast.

[NOTE: This blog post has automatically been processed by the CreateMyVoice BlogToAudio reader. Which means that in addition to reading it, you can hear it on any Amazon Alexa or Google Assistant device, just by asking. Just say to your Google Assistant, "Hey Google, Talk to Create My Voice". And if you prefer to talk to Alexa, just say, "Alexa, Start Create My Voice".]

Photo by Thomas Kolnowski on Unsplash

Sidebar

What Makes a Speaker, Smart

Traditional speakers aren't dull, they just aren't Smart

Letting Smart speakers do the work

Let's wrap this blog post up with a prediction

Try It Out

Most read

Who's online