Test Your Agent Before It Goes Live – Scenario, Tool Call & Simulation Testing
Chapters12
Introduces the purpose of testing within the 11 Labs Agents platform and the three main test categories.
ElevenLabs shows how to use built-in testing: scenario tests, tool call tests, and simulation tests to validate agents before going live.
Summary
ElevenLabs’ Agents Academy walkthrough demonstrates a practical, built-in testing framework for the 11 Agents platform. You’ll learn three core test types: scenario tests (next reply tests), tool call tests (validating tool usage and parameters), and simulation tests (end-to-end conversations). The speaker emphasizes organizing tests with folders, defining expected agent responses, and including success and failure examples to cover edge cases. A standout feature is creating tests directly from a live conversation—you can snapshot a tool call error and generate a corresponding test instantly. The tutorial also covers validating tool parameters with non-deterministic LLM checks or deterministic regex/string matching, and the ability to inject dynamic variables like customer or order ID. Simulation tests present non-deterministic flows by outlining a user scenario (e.g., VP of engineering downloading a white paper) and setting agent success criteria (concise value prop, handle a competitor objection, confirm meeting details, avoid pricing talk). You can run tests individually or in bulk, inspect results in a dedicated view, and integrate tests into CI/CD via the 11labs CLI. The takeaway is clear: test early and comprehensively to ensure your agent handles edge cases and behaves consistently in production.
Key Takeaways
- Scenario tests define conversational context and expected agent responses, with explicit success and failure examples to validate handling of user rejection or similar edge cases.
- Tool call tests verify that the agent selects the correct tool at the right time and validates tool parameters using non-deterministic (LLM) or deterministic (regex/string) methods.
- Tests can be created directly from conversations when you spot a tool-call issue, turning bad interactions into new edge-case tests.
- Dynamic variables can be defined within tests to simulate production data like customer IDs or order IDs.
- Simulation tests run end-to-end conversations with defined user scenarios and agent success criteria (e.g., concise value prop, objections handled, meeting details confirmed, pricing not discussed).
- You can run tests individually or all at once, review results in a dedicated view, and integrate them into CI/CD via the 11labs CLI for pull-request validation.
- Tests can be configured at the workflow node level, validating behavior or transitions of a specific node, not just the default starting path.
Who Is This For?
Essential viewing for teams building and deploying chat agents with ElevenLabs’ platform, especially if you need reliable tool integrations, end-to-end conversation validation, and CI/CD test pipelines.
Notable Quotes
""We currently have three main types of tests. Scenario tests, which you might see as next reply test, tool call tests, and simulation tests.""
—Defines the three core test types and sets the framing for the rest of the video.
""The really cool thing about the 11 agents testing suite is creating tests from inside of a conversation.""
—Highlights a unique UX feature: generate tests directly from ongoing conversations.
""Every bad interaction becomes a new edge case that you can test for and prevent from happening again.""
—Emphasizes the feedback loop between real issues and test coverage.
Questions This Video Answers
- how do I set up scenario tests in ElevenLabs Agents Academy?
- what is the difference between tool call tests and simulation tests in ElevenLabs?
- can I include dynamic variables like customer IDs in my automated tests?
- how do I integrate ElevenLabs tests into a CI/CD pipeline?
- how do I create a test directly from a live conversation in the ElevenLabs UI?
11labs11 agents platformagents testingscenario testtool call testsimulation testCI/CDtool validationdynamic variablesworkflow node testing
Full Transcript
Welcome back to the 11 Labs Agents Academy. Today we're looking at agent testing, a testing framework built directly into 11 agents platform. Testing ensures your agent handles every conversation the way your business needs it to before it reaches the user. We currently have three main types of tests. Scenario tests, which you might see as next reply test, tool call tests, and simulation tests. And it's also worth noting that you can create folders to organize all of your tests. Let's just start with the scenario test. It evaluates your agents ability to handle certain types of interactions.
So you establish the context of these interactions through these conversational nodes here. And I can add another node like so. And after you do that, you're going to want to describe the expected response from the agent. In this particular example, I'm working on an outbound sales agent. So, I want to make sure that if the potential customer is is not interested um in my outbound that it handles that rejection gracefully and just respects it. I'm also going to want to define some examples. So, some success examples, some failure examples. And this is really important because this is the context that the test needs to uh either reprove or reject your test.
So, next let's go to the tool call test. What makes an agent useful is its ability to call tools. And it's really important that we make sure that it's calling the right tools at the right time. So, first you're going to want to select the tool uh that you want to test. And some of these tools here, you know, it's going to ask you to fill out the parameters. It's basically the information that it expects from the agent. And to validate these parameters um you can use either use LLM so non-deterministically or exact string matching or reg x matching so deterministic validation.
You can also set like these dynamic variables in all of the tests. Maybe if you wanted to test with like some production variables like customer or order ID you could define those here. In this particular example I want to test if my outbound agent can successfully book a meeting on my calendar. So I'm going to test this cow.com create booking integration here and you can see that these nodes or these turns are defined in a way that will trigger this tool call. And the really cool thing about the 11 agents testing suite is creating tests from inside of a conversation.
So I'm going to navigate over here to this other agent and just scrolling through the conversation and maybe I notice an error with a tool call for example, right? So, it's having a hard time successfully calling this Zenesk open ticket tool call. And I can just create a test directly from this conversation by pressing this button here. And I'll just select tool call test uh create Zenesk ticket. And instead of having to manually configure the entire test and all of the conversational turns manually, I can just look for this tool um download in here and create the test directly from the conversation.
So this is a key feedback loop. Every bad interaction becomes a new edge case that you can test for and prevent from happening again. The third type of test is the simulation test here. And these run full endto-end conversations. So first you describe this simulated uh user scenario. So you're a VP of engineering. You downloaded our white paper. You're initially hesitant to sign a contract but you're open-minded, right? So that's the user scenario. And then you define the agent success criteria. Alex, which is the name of our agent, delivers a concise value prop, handles the competitor objection once, and then it should confirm the meeting details and not discuss pricing.
you define the maximum number of conversational turns within this conversation. Likewise, you can also add these dynamic variables into any test as well. So, what that's going to do is it's going to generate a full conversation and evaluate the result of the agent. This is useful for more broad non-deterministic flows where you want to ensure that your agent is performing well under the pressure of a more dynamic conversation. And you can implement the simulation testing workflow as well as configure all things testing via the 11 API as well. Additionally, tests can also be configured on the workflow node level, allowing you to validate the behavior or transitions of a specific node rather than only testing the agent from its default starting path.
So once you have all your tests configured, you can either run them individually or you can run them all at once. Um you'll be brought to this nice view where you can kind of look into each of the tests once they're completed. So you can also add these to your CI/CD pipeline using our 11labs CLI that you can just run the command line here. So every pull request gets validated before anything reaches production. So your agents represent your company. Testing ensures that they're handling all of those edge cases consistently. So make sure you start building out your test suite today and ship your agents with confidence.
I'll see you in the next one.
More from ElevenLabs
Get daily recaps from
ElevenLabs
AI-powered summaries delivered to your inbox. Save hours every week while staying fully informed.



