[Inside calldesk] Our 2020 journey through technical challenges (2/3)

A man climbing a steep mountain

This is the second piece of our three articles dedicated to the challenges our technical team has been through recently, that led AWS to award us the 2020 "Architecture of the year" trophy. You can read the first article here.


We're writing a series of articles to explain how we handled one of our biggest 2020 technical challenges at calldesk: scaling our platform to handle from 1000 to 5000 phone calls in parallel using AWS, in less than a month.

In the previous article, we already told you a little bit more about our architecture and the team organization to handle this challenge. You now understand what makes calldesk technology so powerful in terms of voice recognition and understanding.

It's time to give more details on the tools we used to make thousands of phone calls on our platform, the scenario that was tested, their results, and the limits we faced. Keep reading until the end of the article, we'll also tell you about solutions!

Pre-requisites & tools


The first pre-requisite was to build a voice agent that would be able to handle the phone calls made on the platform. We defined a simple discussion that would simulate a realistic phone call:

  • introduction message by our voice agent
  • asking for the call reason
  • asking for the first name
  • asking for the last name
  • asking for an amount of money
  • asking for what the caller thought about the discussion

The objective was to test simple intentions (confirm or deny a question), combined intentions (confirm plus give an entity in the same sentence), and different entities to see if the load has an impact on the understanding of the voice agent.

This discussion would last for about 2 minutes, which was perfect to simulate a production phone call.

The second pre-requisite was to have an audio recording of a caller making a phone call on this voice agent with every entities understood correctly. We only needed the caller voice, to play it when making phone calls by thousands, so that the voice agent could answer.

Finally, we needed to easily track several metrics:

  • number of phone calls made
  • number of phone calls received on the platform
  • number of phone calls with every discussion turn understood
  • understanding rate by entities (last name, amount of money ...)

Our studio already has some features to define such metrics for every voice agent, so it was really easy to mesure the success rate after each load test.

Using the sipp package

As said in the previous article, calldesk infrastructure uses the SIP protocol to handle the Voice over Internet Protocol (VoIP). This is the preferred solution to scale an infrastructure since the standard telephony network is limited by channel capacity and also costs a lot of money.

That's why we made the load tests using the sipp package, which is a CLI software. This package has been developed by Hewlett Packard and has been specially made for load testing using SIP.

It lets us to choose 3 main parameters regarding the load test:

  • total number of phone calls to make
  • total number of phone calls to hold in parallel
  • new phone call rate per second

These 3 parameters changed depending on the test scenario we wanted to try.

sipp is also provided with an XML file, which details the phone call scenario. This phone call scenario details the SIP requests that have to be made, and which audio to play to simulate a real phone call. We used the audio that was recorded when building the voice agent in the previous step.

Team organization

In order to be efficient on D-day, we had already defined the role of each team member:

  • one of us would launch the sipp command from his laptop with the right parameters
  • one of us would monitor our platform dashboard and watch the alert emails that we will receive
  • one of us would calculate the success rate (number of phone calls that was picked up by the platform, sucess rates of every entities...)
  • one of us would write the full report

Load test scenarios & limits


The first load test scenarios were meant to benchmark the limits and see when the platform would begin to be overwhelmed. We started with a low rate of calls per second (5) and not too many calls in parallel (200), and we increased the numbers step by step. We launched 5 different test scenarios during the first session until the 1000 phone calls in parallel were reached.


We started to observe some issues around 1000 containers in parallel as some of our speech-to-text provider thresholds in terms of request rate were reached. It clearly had an impact on the voice agents' understanding rate.

Other issues that we encountered were related to our hardware furniture. We first ran the sipp command on a single laptop, but it quickly reached processor and RAM memory limits. As a consequence, we started to run the sipp command on several laptops but that was not sufficient as some audio media played by sipp were corrupted.

Also, the WIFI network we were using (in a co-working space) was limited in terms of bandwidth and showed limitations as too many UDP packets were lost during the load tests.


Regarding the speech-to-text limits, the easiest solution was to request a limit increase to our providers. No code required and we could easily double the limit, so we sent the request the day after the load test.

Regarding the hardware limitations, we quickly understood that we needed to run the load tests in the cloud if we wanted to reach the 5k goal. That's why we made a script to launch several EC2 instances and whitelist their IPs so that they were able to request our SIP proxies. We would then divide the load between the EC2 instances and aggregate the final results.

We started with a lot of small EC2 instances but ended up with 3 big instances and very good bandwidth capacity to avoid losing UDP packets.

What's next

After this first load test session, we iterate over and over again to remove every impediment from our road to the 5k containers in parallel. We made 2 load test sessions each week on average for a month.

Spoiler alert: we finally did 20% better!

We'll tell you more in a third article, especially about how we've added 2 new speech-to-text engines in our platform (and 1 on-premise) to overcome the limitations. That was quite exciting working on this!

Bonus: a pic from the team during one of our load test sessions!

Journey through technical challenges 2.png

Nice to meet you (I'm on the top left)! Then meet Vincent, Julien, Damien, Lyes and Benjamin (and let's not forget Alex who took the screenshot!).

Want to be part of this amazing tech journey? Check out our engineering job openings, we would be happy to discuss with you!

Subscribe to our newsletter!

One email per month dedicated to customer service. Unsubscribe at any time.