Skip to main content

Hey Google, how do I design for Voice?

Published May 21, 2021

Share article on:

Voice is popping up everywhere. It’s integrated in our phones, tablets, speakers, headphones and even our cars. We’re all familiar with the options: Google Assistant, Apple’s Siri and Alexa from Amazon. They’re virtual personal assistants that help you with productivity or cooking.

Even car manufacturers, such as Mercedes, have integrated assistants in their cars. They allow you to control various integrated systems such as the temperature or seat-heating and even more. I can’t say for sure which functionality it has because I (sadly) don’t own a Mercedes.

What are the benefits of using Voice?

We’re all used to our two-thumbed interactions on our phone. We’re tapping and swiping all day long. Why should we even consider adding another type of interaction in our lives?

Voice has a few advantages:


Voice is fast. Very fast actually. Studies from Stanford University have shown that dictating text messages was up to three times faster than typing a message.


In some cases voice interfaces are much more practical and possibly safer than typing or tapping. For example: driving, cooking, or when you’re across the room from a device.


Almost everyone knows how to talk. Even users who are not very familiar with technology know how to use voice interfaces. Let’s take children for example, they don’t know how to read yet but are naturally able to talk to Alexa. Amazon even made a special Echo Dot “Kids Edition.”


We’ve all received an email or text message from someone, thinking “Whoa that’s rude” or “Is he being sarcastic?” Humans have a hard time understanding tone via the written word alone. Voice Interfaces allow us to use tone, intonation, and rate of speech. That conveys a great deal of information.

When shouldn’t we use voice?

Seeing those advantages you might think: “OMG voice is awesome! Why isn’t voice implemented everywhere yet?!” Well, one of those reasons is because voice isn’t always the appropriate medium for your users. Here are a few reasons why Voice User Interfaces (VUIs) are not always a good idea:

  1. Public spaces
    Many of us work in open-spaces. Suppose you ask your computer: “Hey Siri, open my latest spreadsheet.” Now imagine the chaos if everyone in this open space does this. How would you know which computer is listening? You might be opening spreadsheets on your colleagues’ computers. Let’s not forget the noise from public spaces. It’s not always easy for a Voice Interface to understand you when there’s a dozen other people talking.
  2. Discomfort
    Although we see voice interfaces popping up everywhere, some people don’t feel comfortable using it — even in private, or should I say especially in private?
  3. Privacy
    Voice interfaces do actually record what you say and sometimes listen when you don’t know or want (usually by accident). If users need to discuss a health issue, they might not want to do so by speaking to their phone on the train ride into work — or even want to discuss it if they don’t trust these voice interfaces.

There are a few cases where you really have to think through if voice is the way to go for your product or service. You can always conduct interviews with your target users to explore this idea. Hannes wrote a great article on how to talk with users. Definitely worth reading.

Now that we have a rough idea of when or when not to use Voice Interfaces we can start thinking about designing one. Next we’ll go in depth on what a successful conversation is, how to give feedback in the right manner and provide a few techniques to test your interfaces in different stages.

Conversational Design

When designing for voice you have to think about conversation. Conversational Design is, as the word says, the design of a conversation. We’ll go a bit more in depth in how to design a good conversation.

What makes a successful conversation?

That is the first question you have to ask yourself. A successful conversation is done cooperatively. Paul Grice calls this the cooperative principle. He proposes this principle with 4 maxims. I’ll try to describe these maxims as simple as possible:

  1. The maxim of quantity: Be as informative as you possibly can. But stay to the point!
  2. The maxim of quality: This one’s easy: don’t lie. Only give answers that are supported by evidence.
  3. The maxim of relation: Stay relevant. Don’t start to add information that is not relevant to the conversation.
  4. The maxim of manner: Be as clear as possible. Know who your user is and explain in a way that makes sense to your user.

We’ve all had a conversation with someone where some of these maxims weren’t followed, and it probably resulted in confusion or frustration. If your VUI doesn’t follow these basic maxims, your user will experience the same. And that’s something you want to avoid, obviously.

To make sure you understand what these maxims mean, I want you to repeat after me:

My voice interface will always tell the truth in a way my user can understand and will not start bullshitting about something else.

How to give feedback in a conversation?

A computer can still make mistakes when trying to understand the user. It is important that an action or conclusion is confirmed in a conversation. Cathy Pearl describes implicit and explicit confirmations in her book “Designing Voice User Interfaces” as followed:

Implicit confirmation lets the user know what you understood but don’t ask them to confirm. For example: “Ok, I’ll set a reminder to wash the car before the road trip.”

With explicit confirmation, we force the user to confirm intent. For example: “Do you want me to set a reminder to wash the car before the road trip?”

If your product has built-in confidence detection we can make good use of these types of confirmation. Confidence detection gives us a score for how confident your product is about what the user has said. We can use the score for what Cathy Pearl describes as Three-Tiered Confidence. We use the confidence score to determine which confirmation to use. For all scores above 80%, we can use an implicit confirmation; between 45% and 79%, we ask for an explicit confirmation; and below 45%, we assume that your product has misunderstood the user, so we’ll ask them to repeat themselves.

Conversation markings
On a website, we often see types of feedback for where the user is located, such as breadcrumbs or a highlighted menu item. When an action is successful, we see something green appear. Usually, voice interfaces don’t have the possibility to use these types of visual feedback. That’s why we use conversational markers. Here are some examples.

  1. Timeline Markers: “First”, “Halfway There”, “Finally”, etc.
  2. Acknowledgments: “Thanks”, “Got it”, “Alright”, “Sorry about that”, etc.
  3. Positive feedback: “Good job”, “Good to hear”, etc.

It is important to use conversation markers that are appropriate for your product’s persona. Even the most formal systems benefit from this. Users know they are talking to a machine, but they also appreciate the basics of conversations.

No wireframes, but flowcharts!

Voice Interfaces usually don’t have a graphical user interface. So it’s pretty useless to start wireframing a layout if there isn’t one right? Right! But that doesn’t mean you don’t have to start sketching. You can easily create some flowcharts using online tools. Starting from the point where the interface is triggered, to branching out the different intentions and tackling error states.

When you think you’re done you can test this out by doing some roleplay! Try asking your mom, we made a judging template for your mom to grade your designs on the Momability Scale! You’ll be the computer and someone else will be the user. Keep the flowchart at hand to see whether or not you can match your users’ intentions and replies. You’ll quickly notice if something feels unnatural. This is an iterative process. Keep repeating this process until you think the conversation is on point before you start building. Trust me, it’ll save you time and improve the quality of your voice interface!

Building a Voice Interface

Actually creating a voice interface requires quite some technological knowledge. However, if you thought that this article was so good, and it has inspired you to start creating something, I won’t keep you waiting and instead show you a few (freemium) platforms that make it quite easy to create a Voice Interface that integrates with Google Assistant.

Dialogflow (No-code)

Dialogflow is a low-code platform from Google that allows you to enter the intents of your users and add responses to it. You can use webhooks to communicate with your own services, fiddle with SSML, and use rich responses that are used on mobile devices & smart displays. (No-code)

Dialogflow can become quite cluttered, and you might lose oversight of your conversations quickly. actually has a visual builder that allows you to create your conversations in a flowchart-type of way. If you’re the type of person that relies on visual structure this might be for you.

Mycroft AI (Code)

Mycroft is an open-source voice assistant that focuses on privacy. Because it’s open source, it is easy to expand the assistants possibilities. However, it does require quite some technical knowledge. If you’re familiar with Raspberry Pi and Python it’s definitely worth a try!

Let’s wrap this up

Congrats! You’ve survived this short crash course into designing for voice! You’ve learned about how awesome voice is! But also how not-awesome it can be in some situations.

Don’t forget the sacred maxims of Paul Grice on what a successful conversation is, and if you combine this with the right usage of the spoken word to provide features you’re ready to create a killer voice interface!

Published May 21, 2021

Share article on:


Interested in our work or want to ask a question?