Loading...
Loading...
Complete reference for writing, running, and iterating on evals (automated conversation tests) for ADK agents. Covers eval file format, all assertion types, CLI usage, and per-primitive testing patterns.
npx skill4agent add botpress/skills adk-evalsadk dev--format json| File | Contents |
|---|---|
| Complete file format — all fields, turn types, assertion categories, match operators, setup, outcome, options |
| Running evals, interpreting output, using traces, the write → test → iterate loop, CI integration |
| Per-primitive patterns for actions, tools, workflows, conversations, tables, and state |
eval-format.mdtesting-workflow.mdtest-patterns.mdtesting-workflow.mdeval-format.mdimport { Eval } from '@botpress/adk'
export default new Eval({
name: 'greeting',
type: 'regression',
tags: ['basic'],
setup: {
state: { bot: { welcomeSent: false } },
workflow: { trigger: 'onboarding', input: { userId: 'test-1' } },
},
conversation: [
{
user: 'Hi!',
assert: {
response: [
{ not_contains: 'error' },
{ llm_judge: 'Response is friendly and offers to help' },
],
tools: [{ not_called: 'createTicket' }],
state: [{ path: 'conversation.greeted', equals: true }],
},
},
],
outcome: {
state: [{ path: 'conversation.greeted', equals: true }],
},
options: {
idleTimeout: 20000,
judgePassThreshold: 4,
},
})| Turn | When to use |
|---|---|
| Standard user message |
| Non-message trigger (webhook, integration event) |
| Assert bot does NOT respond |
| Category | What it checks |
|---|---|
| Bot reply text (contains, matches, llm_judge, similar_to) |
| Tool calls (called, not_called, call_order, params) |
| Bot/user/conversation state (equals, changed) |
| Table rows (row_exists, row_count) |
| Workflow execution (entered, completed) |
| Response time in ms (lte, gte) |
adk evals # run all evals
adk evals <name> # run one eval
adk evals --tag <tag> # filter by tag
adk evals --type regression # filter by type
adk evals --verbose # show all assertions
adk evals --format json # JSON output for CI
adk evals runs # list recent runs
adk evals runs --latest # most recent run
adk evals runs --latest -v # with full detailsuserevent// CORRECT
{ user: 'hello', expectSilence: true }
{ event: { type: 'payment.failed' }, expectSilence: true }expectSilence// WRONG — missing user or event
{ expectSilence: true }// CORRECT — verifies the LLM extracted the right values
{ called: 'createTicket', params: { priority: { equals: 'high' } } }// INCOMPLETE — doesn't verify params were correct
{ called: 'createTicket' }outcome// CORRECT — final state checked once after all turns
outcome: {
state: [{ path: 'conversation.resolved', equals: true }],
tables: [{ table: 'ticketsTable', row_exists: { status: { equals: 'open' } } }],
}// WRONG — table may not be written until after all turns
conversation: [
{
user: 'Create a ticket',
assert: { tables: [{ table: 'ticketsTable', row_exists: { status: { equals: 'open' } } }] },
},
]// CORRECT — start in a known state
setup: {
state: {
user: { plan: 'pro' },
conversation: { phase: 'support' },
},
}// WRONG — depends on the bot correctly processing setup turns
conversation: [
{ user: 'I am on the pro plan' }, // hoping bot sets user.plan
{ user: 'I need help with billing' }, // actual test turn
]new Eval({})import { Eval } from '@botpress/adk'expectSilenceassert.responseusereventadk evals <name>expectedactual