Automated tests

13 March 2026 04:52

Use automated tests to quickly test and improve your LLM tasks. You can generate output for multiple test cases at once, and compare the output generated by different prompts and models. You can also automatically evaluate your task output and generate insights and suggested prompt refinements.

You need an administrator or publisher role in your team to create and run automated tests. See Automated test permissions.

Automated tests are separate from your tasks and usages, so you can safely experiment without affecting your chatbot:

Changes you make to your tasks or usages won't affect your automated tests.
Changes you make in an automated test only apply within that test: they don't impact your other tests, or your usages or tasks.

You can edit the prompt and connector to use before you run a step.

Automated test steps

Automated tests use three steps that build in a sequence:

Step 1: Run the task you're testing and generate output for each of the test cases.
For example, run the rewrite utterance task for a list of long utterances to rewrite.
Step 2: Evaluate that output against criteria and metadata you specify and mark each output with either 'PASS' or 'FAIL'.
Step 3: Generate insights and refined prompts from the evaluations to improve the task's prompt.

Each step builds on the output generated by the previous one. For example, you'll need to generate some rewritten utterances before you can evaluate how well they were rewritten, and you'll need to generate those evaluations before you can generate insights to improve your prompt.

You don't have to run all three steps together: you can run any step by itself provided you have the data it needs.

Result sets

Each time you generate a task's output (Step 1) or generate evaluations (Step 2), the results are stored in a new result set. The result set contains the step output as well as everything that was provided as an input.

You can:

Use a result set as the input for other steps in the same automated test.
Download the result set as a CSV to upload into other automated tests.

An automated test can have a maximum of twenty result sets. If you generate more than twenty, the oldest result set is replaced by the new one. You can favourite a result set to prevent it being replaced.

Insights

When you generate task output (Step 1) or generate evaluations (Step 2) using a result set, the result set isn't changed: the results are stored in a new result set.

When you generate insights (Step 3) from a result set, the generated insights are stored with the result set you used.

A result set can have only one set of generated insights. If you re-generate insights, you'll only see the latest version. Generated insights are not included when you download a result set, so make note of anything you want to keep.

Create and use automated tests

To run automated tests, your chatbot must have:

Generative AI configurations enabled
Automated testing enabled
An automated testing usage with configured evaluate text and generate insights tasks.
A usage with tasks to test.
Automated tests are currently only available for tasks in the TrueIntent usage.

Contact inGenious AI support to request the generative AI and automated testing features.

You can:

Automated test permissions

	Reviewer	Editor	Publisher	Admin
Create an automated test	-	-	✓	✓
Edit an automated test	-	-	✓	✓
Run an automated test	-	-	✓	✓
Delete an automated test	-	-	✓	✓