End to end automation testing of Alexa Skill (conversational AI testing)

6 min readJun 9, 2021

Testing has evolved a lot over many years for all the traditional software but there’s no doubt that AI is a hot topic. All the popular sectors are now focusing on improving their customer services with AI integration to their products and services and targeting wider scope with different AI channels like Chatbot, Virtual Assistant, Voice Assistant etc.

Like other software and its testing, Conversational AI automation testing is also possible and many of them are available open-source along with few paid solutions.

In the first place, AI is new to many industries where Shift left and early automation implementation becomes important in order to successfully product.

Despite the varied activity in this space, AI is still in an exploratory phase and largely an untested customer value proposition. As people start to feel more at ease with the technology, designers need to pay greater attention to how AI fits into the wider guest experience. There are key design considerations we expect to surface for automation with this emerging technology.

What is Conversational AI?

Conversational AI is the technology that ultimately enables machines to naturally interact with humans via language. It is a subset of artificial intelligence that leverages concepts like neural networks, machine learning and others. And make them available to build useful applications with it like hands-free control while you are driving or at home, Alexa waiting for your command or even virtual agents that assist in customer support over phone.

How does it work?

Conversational AI is a fusion of various technologies like Automated Speech Recognition (ASR), ML, natural language processing (NLP) and Natural Language Understanding (NLU), which process every written or spoken word and check out the best way to respond and grasp from every user interaction.

Conversational AI flow is sliced into three sections:

· Automatic Speech Recognition (ASR)

· Natural Language Processing (NLP) or Natural Language Understanding (NLU)

· Text-to-Speech (TTS)

What are Alexa skills?

As we have apps for smartphones, similarly for Alexa devices we have skills which can be enabled and disabled from Alexa apps.

Keywords in Alexa skill kit:

Wakeword: The Echo devices have a ring of always-on microphones, meaning the device is always in listening mode. It will only ‘wake up’ and actively pay attention to you when it hears a specific word or phrase, called a wakeword

- For Alexa: Alexa, Echo, Computer are wakewords

- For Google assistant: Ok Google

Invocation Name: An ‘invocation name’ is the word or phrase used to trigger your skill. For example

- Alexa, play music

- Alexa, ask doctor connect

Intent: An intent is what a user is trying to accomplish
Utterance: Utterances are the specific phrases that people will use when making a request to a voice app or Alexa
Slots: A slot is a variable that relates to an intent allowing Alexa to understand information about the request

Utterance example:

So far, I have 3 different successful POC done on Conversational AI automation.

Botium Connector for Amazon Alexa Skills Management API
Botium alexa avs connector
Custom framework/tool to test end to end Alexa skills

Broadly I will be taking each of them one by one.

AUT Summary:

Custom Alexa skill Doctor Appointment is created, this helps users to book an appointment with a doctor.

Botium Connector for Alexa Skills Management API(SMAPI):

This verifies the utterance mapped with each intents. Using this once can test Alexa skill programmatically instead of testing it through Alexa Skills Kit developer console.

It uses Skill Invocation API, to invoke your application for testing purpose
Skill Simulation API, helps to test skill and see the intent that a simulated device returns from your interaction model

Demo:

Result:

Tools and technology:

NodeJs, AVS, SMAPI, Mocha

Botium Connector for Amazon Alexa with AVS:

This tool can be used to test Alexa skills. For this one has to register a virtual Alexa device in the Alexa voice service (AVS) portal. This Works the same way as a physical Alexa device. It is not bound to any Alexa Skill, so you have to activate your Alexa skill with its activation utterance. Like “Alexa, order a pizza”

For single level conversation it will be flawless for example “Alexa, turn light on”. But when it comes to test multilevel conversation like in this case (Doctor connect skills) it fails, as it validates the Alexa response at every step. Whereas Alexa waits for user input for 8 seconds which is a default time of it. If the user fails to respond, the skills session will expire and Alexa will be out of context of the skill.

Notes: Botium AVS connector is not meant to handle multilevel conversation, as Alexa moves out of context when the transcribe job is in process.

So I raised an issue — https://github.com/codeforequity-at/botium-bindings/issues/124 on githhub which is maintained by Florian Treml a Co-Founder and CTO of Botium. And what was his response.

It’s an appreciation for me. Thank you Florian Treml

Demo:

Result:

Tools and technology:

NodeJs, AVS, AVS-connector, AWS Polly, transcribe, Mocha.

Custom Framework:

I have created a custom tool and framework for testing the Alexa skill of multi level conversation end to end.
To make it workable you need to register a new product in the Alexa Voice Service console. This registered product will work the same as the Alexa device. It is not restricted to any Alexa Skill and can be invoked by invocation name. Using this framework you can test production skills as well.

How does it work?

Utterances are converted from text to speech into .mp3 file using Amazon Polly
Converted .mp3 is then feed to Alexa Voice Service (AVS) via Alexa client and returns the Alexa response in same format i.e .mp3
Then output response is converted into .wav file using ffmpeg and stored in S3 bucket
Finally AWS Transcribe turns the stored .wav file into text. Which is then verified with the expected response text

Workflow: