Developing a conversational AI with Rasa

Introduction

Chatbots. As I browse the web, they constantly pop up. They always want to capture my attention and interrupt me in what I wanted to do. Some even have a name and try to appeal to my emotional side. I quickly hit the X and away they go. Like a dead zombie, they sometimes come back if they notice I am not moving the mouse. Chatbots can be really annoying.

A conversational agent is more than a chatbot though. Siri, Alexa and other voice assistants have an “language engine” in the background that can maintain a dialog in natural language with the user. Such an engine can also be used by games in interacting with virtual characters. Or by household robots that need to respond to input from the user. By call centres that want to optimise their customer routing. Even in interfaces that aim to anticipate the next step of the user. Interaction with technology using natural language has long been the pinnacle of interaction design: what if you could just talk to your computer, and it would be able to understand and respond to you?

Rasa is such an engine to develop a conversational agent. Rather than giving you as a developer a GUI to create a basic predefined chatbot, Rasa provides you with a coding framework to create your own agent with Python. You can hook it up as you like, deploy it to your servers, and even go in and adjust the inner workings to completely make it work for you. On top, it is open-source and maintained by a cool company in Berlin.

All the right reasons for me to explore their tech. So, I created a chatbot. I know.

The experiment

Jokes aside, creating conversational AI with Rasa is powerful. To understand what is possible, I set out to develop a more complicated and custom chatbot. This chatbot should be able to book a hotel room for an imaginative hotel. The goal is that the chatbot collects a large number of input variables (city, name, date, duration, number of guests, breakfast, name and payment options) to book a room.

From a User Experience point of view, I followed the guideline that conversation flow should be optimised for efficiency and utility (Følstad & Brandtzaeg, 2020). I also followed the guideline to keep a neutral tone of voice reducing anthropomorphising elements. This helps to keep an appropriate level of expectations to prevent a user from entering too complex language (Luger & Sellen, 2016).

To add a sense of realism, I added 4 tech requirements. These requirements aim to provide high robustness. Robustness is a key-ingredient for user acceptance of a chatbot (Følstad et al., 2018).

1. Handling a long conversation

The agent’s goal is to assist the user in booking a hotel room. Booking a hotel room is a long sequence of steps and as such requires a lot of input from the user. The goal for the chatbot is to handle this robustly. For instance, utterances of the user might contain multiple variables (“Book hotel in Berlin for 27.07.2020 for 2 nights”). The agent should be able to extract the information and know what information is missing and collect it.

Example dialog showing that sequences can be long and a user might enter multiple variables in a single utterance.

Solution: Rasa Forms

Rasa has a built-in mechanism to keep track of required input and automatically continues where input is missing. Spot on.

2. Recognising and validation of city names

There are many, many, many cities in the world. It is intractable to create a database with all city names. As my imaginative hotel is in only 50 locations in the world, I need a way to extract whenever a city is entered and compare this to my simple lookup database.

A couple of different examples what a user might input to enter a city.

Solution: Adding SpaCy in the NLU-pipeline

spaCy is a framework that predicts entities such as person, city, organisation and many more. The approach here is not to maintain a giant database, but to predict semantically predict the conceptual meaning of a word with a machine learning model. A city falls in the entity “Geopolitical entity” ( try it out yourself ) and can be extracted and handed over to Rasa. Using a Rasa Action Server this information can then be handled. Nice.

3. Recognising dates

Asking the user about the arrival date elicits many response options. Some would say they want to arrive “tomorrow”, others perhaps “next monday” while others enter a US date format. Maintaining such logic manually will be impossible.

Solution: adding Duckling in the NLU-pipeline

Rasa includes a way to include Duckling developed from Facebook. This “extension” allows for entity extraction of dates as well as transforming it to date objects that can be parsed with business logic.

4. Gracefully handle interruptions

Since conversations with this booking agent can be really long, it needs to be able to handle interruptions gracefully. For instance, if the user has a question during the sequence, it is only natural that there is a valid response and the process continues.

Example where the user interrupts the flow with a question on wifi.

Solution: Rasa rules

In addition to “stories” Rasa allows for “rules”. Rules respond to input at all times, whereas stories use machine learning to predict when they should be activated.

I created a collection of “faq” questions that will be triggered with a rule (so they allow trigger a response). After an answer has been uttered, the software is instructed to resume the form by repeating the last question.

The NLU-pipeline and language model

The NLU-pipeline needed some accommodations to fit the requirements. It also included a pretrained english language model from spaCy.

The NLU-pipeline. Looks complicated but there are many samples to orient against.

One also has to keep in mind, you still need to provide sample sentences for likely input from the user. It is surprising to see how ways a user can express an intent.

It is hard to anticipate this during design time and requires continuous adjustment. For this, Rasa offers Rasa X, which allows “content editors” to maintain the dialog script without any code. I integrated this as well and indeed makes it much simpler to update after deployment.

Deployment & Result

As part of the exercise, the chatbot was deployed on a publicly reachable Microsoft Azure Virtual Machine. I created 3 docker containers to conveniently deploy the software. Due to costs, the agent is now offline again.

Here is a video where I talk over the results:

The code is available on my Github .

Final thoughts

Rasa has a learning curve to it. In my opinion, though, if you are a professional serious about conversational AI, this is your tool of choice. Being able to customise everything is compelling. Perhaps even, in the future, it can fix my hate towards chatbots.

References

Følstad, A., & Brandtzaeg, P. B. (2020). Users’ experiences with chatbots: findings from a questionnaire study. Quality and User Experience, 5(1), 1–14. https://doi.org/10.1007/s41233-020-00033-2

Følstad, A., Nordheim, C. B., & Bjørkli, C. A. (2018). What makes users trust a chatbot for customer service? An exploratory interview study. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11193 LNCS(December), 194–208. https://doi.org/10.1007/978-3-030-01437-7 16

Luger, E., & Sellen, A. (2016). “Like having a really bad pa”: The gulf between user expectation and experience of conversational agents. Conference on Human Factors in Computing Systems — Proceedings, 5286–5297. https://doi.org/10.1145/2858036.2858288