How to Benchmark and Measure a Chatbot's Performance
Benchmarking a custom chatbot's metrics is primordial. Benchmark the right metrics will allow you to follow your solution's progress over the next years. Having data points you can measure and track over that period of time will be extremely valuable, making sure you work towards the right KPI for your project.
It is a shame, but one of the downfalls of working in the enterprise space are those iron-clad NDA’s that prevent us from going into detailed results and implementations. It is a huge shame as there are some well-known names in the mix and I’d love to tell you about it.
That all said, in this post I can share some insights on the type of metrics we tend to benchmark and track for the largest of clients (we are talking Fortune500).
Benchmarks and metrics in a chatbot solution
In terms of stats, every deployment tends to utilise two main things; benchmarks and metrics.
Benchmarks
Benchmarks are a snapshot of how the problem looked before a chatbot solution is deployed.
In sales, it might be current revenue, sales cycle times, average basket size. In customer service, it might be the number of enquiries per day, the sentiment of outcomes or average time taken to solve a problem. For HR, we would record how busy staff with the number of inbound conversations, time spent replying to emails and things like that.
The point of benchmarking is to help a business make a case for further investment in a chat solution. If they cannot see the before and after data, how can they assess the ROI or justify the cost?
Metrics
Metrics are things we track throughout the lifetime of the chatbot. Right from the get go to the point the client turns it off (if it has a defined life span).
The metrics we track depend on the role of the chatbot. After all, sales metrics are very different to customer service and HR. It is important to help a client define these metrics from the start, as some of them need specific programming or third-party service integrations.
Examples of chatbot metrics and benchmarking
Below you will see an anonymised metrics and benchmark document for a client.
For context, the chatbot lives in a ubisend converse web-widget and is deployed across their FAQ, contact and member areas of the website. It is completely automated and has a fall-back to human functionality should the user request to speak to a customer service rep.
In another blog post, I will pull out the numbers and show the impact the chatbot has, for now, I wanted to start, well, at the start.
4 pre-deployment benchmarking:
-
Number of enquiries per day/week/month.
-
Number of unresolved per day/week/month.
-
Customer service employee workload (time spent dealing with inbound communication).
-
Average time taken to resolve customer service ticket.
8 customer service chatbot metrics to track:
-
Number of successful engagements.
-
Number of non-responded engagements.
-
Number of fall-backs to human.
-
Number of general errors.
-
Number of users the chatbot took to its One True Goal.
-
Percentage of new vs. existing users and how far they get through the conversation.
-
Average length of interaction.
-
Average time taken to resolve customer service ticket.
The examples above shows the difference between benchmarking and metrics. A benchmark is static, you write it down and compare against it. Ideally, further benchmarks should be taken at regular intervals (monthly, quarterly, etc.). Comparing benchmarks over time helps you to spot trends.
Metrics are live numbers, like Google Analytics and KPI dashboards. These tend to help more with iterating the chatbot and making it perform better over time. As an example, improving the number of people getting to experience the One True Goal metric probably means the chatbot is getting better at answering questions. Reducing the average length of interaction might suggest the chatbot is satisfying users quicker.
So yeah, there we go. A little bit of insight into the way we benchmark and track when we develop a chatbot for a client.
There are two main takeaways. The first, if you do not benchmark and track metrics, how can you prove your chatbot is a winner and make it better over time? Second, every chatbot is different. Your metrics and benchmarks need to be bespoke to you/your client.
Hope you found it interesting, here on Twitter or LinkedIn if needed.