<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=979905748791482&amp;ev=PageView&amp;noscript=1">
Chatbot Best PracticesNovember 15, 2022Written by Alex Debecker

Alexa Chatbot Best Practices: Create Excellent Voice Experiences

Are you about to build your first voicebot? Have you already built a few but seem to be getting low engagement? Do you feel slightly terrified at the idea of building an Alexa chatbot, one that talks to your customers?

Fear not, ubisend is here with a series of best practices for building an Alexa chatbot.

A history of working with voice chatbots

We've been working with voice since 2017, building what was, back then, one of the first high-profile voice projects in the world (read more: Backed by Google).

Recently, we released Alexa as a channel on our chatbot platform. You can now easily build an Alexa skill using our drag-and-drop and no-code platform.

However, no matter how easy it is to build an Alexa chatbot with our platform, building a good Alexa chatbot takes some getting used to.

Hence this article. Let's work through some of the best practices and wisdom we've gathered over the years.

1. Spend extra time planning the three key actions: launch, stop, help

There's a recurring theme in most articles covering chatbot best practices: teach your users how to get help. Don't let them get stuck. Always provide a way out.

With voice, you have two new actions to deeply think about on top of getting help:

  • How to launch your skill
  • How to quit your skill
  • How to get help within your skill

With typical chatbots, aka text bots, you don't have to think of launching or quitting. The chatbot lives on a page/social media platform, it will launch when the user clicks on it. The chatbot will also quit when the user closes the page, the tab, or their device.

With Alexa, this isn't the case. Let's dive into each of these actions and suss out some best practices.


Before your users even get to experience your amazing voice bot, they need to launch it. The now famous 'Alexa, open...' -- you get to make your own one of these!

Here are three best practices we've learned from creating launch actions:

  1. Give your skill a short, punchy name. Remember your users will have to say 'Alexa, open [skill name]' every time they want to use it. Saying 'Alexa, open My Little Fairytale Sunrise On The Beach' might get tiring after a while.
  2. Give your skill an easily pronounceable name. We've seen it happen 100s of times: a skill name that sounds cool but out of 20 people you'd hear it pronounced 20 different times. Think of accents or even foreigners using your skill.
  3. Craft an impactful first message. Upon launching your skill, your users will hear their first message. What do they need to know at this very stage? What's important? Ensure they know how to progress using the skill beyond this step, but don't overwhelm them with content.


Ending an interaction with an Alexa chatbot is a whole different world. Your users can simply stop responding, sure. But what are you most likely to do when you want to quit an Alexa skill?

That's right, you shout: 'Alexa, stop!'.

There's one and only one best practice when it comes to creating a stop action on Alexa: quit the skill.

No fluff. No 'Are you sure?'. No long speech explaining how to come back. Quit the skill and let your users get on with their life.


Finally, the help action is very much the same across all chatbot channels (voice or not). If a user asks for help, respond with a few sentences explaining how they can use your skill.

If relevant, also offer a way to contact you/your team. This might come in handy for customer service Alexa chatbots, for example.

The good news is the ubisend platform provides you with all three of these actions. All you need to do is write the content. Take a tour of the ubisend platform now.

2. To listen or not to listen, that is the question

If you have lots of experience building text chatbots, this may be the most alien concept for you to grasp as you build your first Alexa bot.

A text chatbot is always ready to continue its interaction with the user. The text field is always open (unless you use one of our custom composers). If the user steps out and comes back an hour later to ask another question, the chatbot will pick up where they left off.

With voice, it's a different story. To keep interacting with a voice chatbot, the device must be actively listening; quite literally. Its microphone must be open, waiting for the user to talk.

On Alexa devices, this is shown by the persistent blue colour:

Here are the best practices when it comes to listening (or not):

  1. Take control of the situation. Using the ubisend platform, by default your Alexa chatbot will shut off after reading out a message. We give you the control. If you want the chatbot to listen for an answer, use the Listen action.
  2. Don't take this decision lightly. Voice devices sit in people's homes. Making your device listen when it's not categorically imperative for it to do so can create issues.
  3. Never listen when it's not logical to do so. If the conversation is clearly over, if the chatbot says 'thank you' or 'goodbye', if the goal of the bot was reached, just stop listening. Don't force your users to quit the Alexa skill.

3. Keep messages (even) short(er)

Some would argue concise messages are the name of the game in chatbot creation world. If that's true (and I believe it is), it is even truer with voice chatbots.

Even the most hardened Alexa fan has to admit: there's a limit to how much you want to listen to it blabber on and on. The somewhat robotic voice quickly gets to you and soon you find yourself shouting 'ALEXA STOP'.

To avoid this, our single best practice is simple: keep messages short.

In our experience, any message should not extend beyond three lines. Anything beyond that and you'll quickly start losing your user.

4. Design for voice only (we know, devices have screens)

Screen devices like the Echo Show are becoming more and more common. They are powered by Alexa and voice-activated, but also display a wide touch screen.

Through the touch screen, you can actually present rich media interactions. Your chatbot is not constrained to voice, it can also display images, buttons, carousels, and more.

So, which is it? Do we build for voice, for screen, or both?

Our best practice is to design for voice first. Anything you can display on the screen is great but secondary. In practice, this means:

  1. Always give instructions to the user on what to do next. Don't rely on buttons. Tell your users what to say to move the conversation along. If they click the button, fine (thanks to ubisend's omnichannel capabilities, both will act in the same way), but they must be able to carry on without ever looking or touching the screen.
  2. Treat imagery as secondary. Don't display important information in an image. Instead, use images to illustrate or support the voice content.

5. Alexa stores user information, use it

(When relevant, of course.)

Capturing information from the user is a staple of the chatbot interaction.

  • What is your name?
  • What is your address?
  • What is your email address?

And so on. If you've built more than one chatbot, you've most likely built interactions that capture user information.

In our experience, capturing user information via voice can sometimes be tricky. Consider the following example, where Alexa asks the user for their address and the user responds:

"10 Wymondham Street, Costessey".

Now, if you're not familiar with this neck of the woods (and why would you be), a local would pronounce this something like:

"10 Wyndham Street, Cossy".

Unsurprisingly, this is what Alexa would pick up. If you were relying on this information to build a customer profile, you'd be out of luck.

So what's the alternative? Use (when relevant) user information stored on their Alexa device!

Alexa devices, and their associated Amazon account, store all sorts of information on the user including their name, address, email address, language, and more. This information is available to you as long as you

  1. Justify why you're going to capture it (can't just go around grabbing people's personal information for no reason).
  2. Let the users know when they install your skill.

There are really two best practices for the price of one here.

The first is to use the information made available to you. It's there, all you need to do is inform the user. Don't risk capturing inaccurate information.

The second is to make your users' life as easy as possible. Imagine having to spell out your entire address over to Alexa. Is that a good chatbot experience? Probably not.

To learn more, read our doc: Grab Amazon Alexa user information with an integration

6. Remember: no live chat

Chatbots and live chat fallback, name a better duo.

When you know your users can always fall back to reach you via live chat, you design your chatbot in a certain way. You can be bolder in your approach. You can add more content. You can add more knowledge into your chatbot, instead of focussing on just one thing (although that's generally a bad idea).

With Alexa chatbots, unfortunately, live chat is not an option.

(And thank God, imagine how weird it would be to interact with a human via Alexa).

What we found over the years boils down to this one best practice: voice chatbots need to offer a simple, precise, and targeted experience.

There is, essentially, no net to catch your users if they get lost. So you want to make sure they never veer off the path, never get lost, never get confused.


As I wrap up this article, I hope you've taken away crucial bits of information about building your first voice chatbot.

What's really exciting about voice is it's almost all uncharted territory. Sure, there have been over 100,000 skills developed on Alexa by now (source)... but that's nothing.

Think of the evolution between the 100,001st website and today. There's a universe of opportunities ahead of us in the voice chatbot space, and we hope ubisend can help you capture some of it.