Smart Speaker Voices and SSML

"Alexa or Hey Google,Ask Create My Voice to Read Blog Post 23"--------

A recent report regarding Smart Speaker adoption, stated that over 66 million people in the United States now own at least one smart speaker. Google and Amazon are making smart speakers, smarter, as they race to put them in every device that we own, and every room that we enter.

One of the things that make smart speakers appear smarter, is the ability to use different voices. With Google, you can change the default voice to an alternate one. And with Amazon smart speakers, you can use multiple voices. Using an alternate voice (or voices) can enhance the user's experience, and expand your audience.

In addition to changing voices, there are several other features to help you make your content sound, just right. These new features are referred to as SSML (or Speech Synthetic Markup Language). SSML is a markup language like HTML, that provides a way to enhance how your content sounds.

Let's look at a few examples of how SSML is used.

Listen to Alexa read this post :

Starting with the Content

Our content is composed of Diction, Grammar, and Style. Diction is our choice of words. Grammar is how we structure our words into sentences and paragraphs. And Style is how we choose to communicate an idea or thought. We tailor our Diction, Grammar, and Style to ensure that our brand engages our audience. Because the same set of words can be read in multiple ways, SSML helps the devices turn our content into audio.

Changing Cadence

Punctuation is used by text-to-speech tools to know when and how long to pause. In general, a comma causes a short pause, a period causes a longer pause. And the end of a paragraph, causes a slightly longer pause. But what if we want to add pauses into our text? SSML provides the necessary markups.

In addition to pauses, SSML also provides tags to slow down my speech, or to lower the pitch of my voice.

Changing Form

Heteronyms can cause challenges when turning words into speech. An example is the word spelled R E A D. Should this be pronounced like reed, or red? When the text to speech tool chooses the wrong way to pronounce a heteronym, SSML can provide the necessary instructions to fix the pronunciation.

Like heteronyms, sometimes a sequence of digits can prove ambiguous when being turned into speech, like the digits 2 5 1 9 4 9 4:

Should these be read as a cardinal number: 2519494

or just the digits: 2519494

or should these be understood and read as a phone number: 2519494

Changing Voice

While the SSML specification provides for changing voices, as of the writing of this post, only Amazon supports the Voice tag. Google only allows the creator of the Google Voice App to select one voice (from a list of four) to be used for all the Voice App interaction.

Amazon implemented the SSML Voice directive and currently has many English-speaking voices to choose from. In addition to both male and female voices, Amazon includes English voices from Australia, Great Britain, and India, as well as the United States. The ability to change the voice being used opens many possibilities to engage your audience. Not only can you change the default voice for a particular post, but you can also change the voice within a post.

The following is an example of switching voices within a post:

Hi, my name is Naomi. I can read your content.

If you want a true English accent, just ask for Olivia!

Or, if you want to connect with the youth, just ask for Jackson to read your stuff.

Changing Language

In addition to the four English dialects, Amazon currently has five additional languages including French, German, Italian, Japanese, and Spanish. There are a total of 27 different voices available. With the increasingly popular translation tools, you can even have your content read in another language using language-specific voice options.

The following are examples in French, Spanish, and German. Let's have Alexandre, Sofía, and Karl introduce themselves.

Bonjour, je m'appelle Alexandre. Si vous souhaitez que votre contenu soit disponible pour les francophones, je peux le faire pour vous.

Pregunta por Sofía si quieres que tu contenido esté disponible para el español.

Wenn Sie Ihre Inhalte für den deutschsprachigen Raum zur Verfügung stellen möchten, fragen Sie einfach nach Karl!

Conclusion

As you can see (or hear), there are many options to customize the delivery of your content. As the voices continue to mature and the tools to customize the voice experience expand, smart speakers will provide a whole new way to see (I mean hear) your brand. Your audience has been shifting attention from computers to smartphones. Both Google and Amazon are now pushing voice technology as the new, frictionless way of getting information. Preparing your brand to engage your audience using voice will ensure your brand has both a visual, and verbal, presence. Create My Voice, can help you easily get on this new platform.

Sidebar