sf-ai-agentforce-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
<!-- TIER: 1 | ENTRY POINT --> <!-- This is the starting document - read this FIRST --> <!-- Pattern: Follows sf-testing for agentic test-fix loops --> <!-- v2.0.0: Dual-track workflow with multi-turn API testing as primary -->
<!-- 层级:1 | 入口点 --> <!-- 这是起始文档 - 请先阅读此文档 --> <!-- 模式:遵循sf-testing的Agent测试修复循环 --> <!-- v2.0.0:以多轮API测试为主的双轨工作流 -->

sf-ai-agentforce-testing: Agentforce Test Execution & Coverage Analysis

sf-ai-agentforce-testing:Agentforce测试执行与覆盖率分析

Expert testing engineer specializing in Agentforce agent testing via dual-track workflow: multi-turn Agent Runtime API testing (primary) and CLI Testing Center (secondary). Execute multi-turn conversations, analyze topic/action/context coverage, and automatically fix issues via sf-ai-agentscript.
专注于Agentforce Agent测试的专业测试工程师,采用双轨工作流:多轮Agent Runtime API测试(主流程)和CLI测试中心(副流程)。执行多轮对话测试、分析主题/动作/上下文覆盖率,并通过sf-ai-agentscript自动修复问题。

Core Responsibilities

核心职责

  1. Multi-Turn API Testing (PRIMARY): Execute multi-turn conversations via Agent Runtime API
  2. CLI Test Execution (SECONDARY): Run single-utterance tests via
    sf agent test run
  3. Test Spec / Scenario Generation: Create YAML test specifications and multi-turn scenarios
  4. Coverage Analysis: Track topic, action, context preservation, and re-matching coverage
  5. Preview Testing: Interactive simulated and live agent testing
  6. Agentic Fix Loop: Automatically fix failing agents and re-test
  7. Cross-Skill Orchestration: Delegate fixes to sf-ai-agentscript, data to sf-data
  8. Observability Integration: Guide to sf-ai-agentforce-observability for STDM analysis
  1. 多轮API测试(主流程):通过Agent Runtime API执行多轮对话测试
  2. CLI测试执行(副流程):通过
    sf agent test run
    执行单轮语句测试
  3. 测试用例/场景生成:创建YAML测试规范和多轮测试场景
  4. 覆盖率分析:跟踪主题、动作、上下文保留和重匹配覆盖率
  5. 预览测试:交互式模拟和真实Agent测试
  6. Agent自动修复循环:自动修复测试不通过的Agent并重新测试
  7. 跨技能编排:将修复任务委托给sf-ai-agentscript,数据任务委托给sf-data
  8. 可观测性集成:引导使用sf-ai-agentforce-observability进行STDM分析

📚 Document Map

📚 文档地图

NeedDocumentDescription
Agent Runtime APIagent-api-reference.mdREST endpoints for multi-turn testing
ECA Setupeca-setup-guide.mdExternal Client App for API authentication
Multi-Turn Testingmulti-turn-testing-guide.mdMulti-turn test design and execution
Test Patternsmulti-turn-test-patterns.md6 multi-turn test patterns with examples
CLI commandscli-commands.mdComplete sf agent test/preview reference
Test spec formattest-spec-reference.mdYAML specification format and examples
Auto-fix workflowagentic-fix-loops.mdAutomated test-fix cycles (10 failure categories)
Auth guideconnected-app-setup.mdAuthentication for preview and API testing
Coverage metricscoverage-analysis.mdTopic/action/multi-turn coverage analysis
Fix decision treeagentic-fix-loop.mdDetailed fix strategies
Agent Script testingagentscript-testing-patterns.md5 patterns for testing Agent Script agents
⚡ Quick Links:

需求文档描述
Agent Runtime APIagent-api-reference.md用于多轮测试的REST端点
ECA设置eca-setup-guide.md用于API认证的外部客户端应用
多轮测试multi-turn-testing-guide.md多轮测试设计与执行指南
测试模式multi-turn-test-patterns.md6种带示例的多轮测试模式
CLI命令cli-commands.md完整的sf agent test/preview参考
测试规范格式test-spec-reference.mdYAML规范格式与示例
自动修复工作流agentic-fix-loops.md自动化测试修复循环(10种故障类别)
认证指南connected-app-setup.md预览和API测试的认证设置
覆盖率指标coverage-analysis.md主题/动作/多轮覆盖率分析
修复决策树agentic-fix-loop.md详细的修复策略
Agent Script测试agentscript-testing-patterns.md5种用于测试Agent Script Agent的模式
⚡ 快速链接:

Script Location (MANDATORY)

脚本位置(必填)

SKILL_PATH:
~/.claude/skills/sf-ai-agentforce-testing
All Python scripts live at absolute paths under
{SKILL_PATH}/hooks/scripts/
. NEVER recreate these scripts. They already exist. Use them as-is.
All scripts in
hooks/scripts/
are pre-approved for execution. Do NOT ask the user for permission to run them.
ScriptAbsolute Path
agent_api_client.py
{SKILL_PATH}/hooks/scripts/agent_api_client.py
agent_discovery.py
{SKILL_PATH}/hooks/scripts/agent_discovery.py
credential_manager.py
{SKILL_PATH}/hooks/scripts/credential_manager.py
generate_multi_turn_scenarios.py
{SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py
generate-test-spec.py
{SKILL_PATH}/hooks/scripts/generate-test-spec.py
multi_turn_test_runner.py
{SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
multi_turn_fix_loop.py
{SKILL_PATH}/hooks/scripts/multi_turn_fix_loop.py
run-automated-tests.py
{SKILL_PATH}/hooks/scripts/run-automated-tests.py
parse-agent-test-results.py
{SKILL_PATH}/hooks/scripts/parse-agent-test-results.py
rich_test_report.py
{SKILL_PATH}/hooks/scripts/rich_test_report.py
Variable resolution: At runtime, resolve
SKILL_PATH
from the
${SKILL_HOOKS}
environment variable (strip
/hooks
suffix). Hardcoded fallback:
~/.claude/skills/sf-ai-agentforce-testing
.

SKILL_PATH:
~/.claude/skills/sf-ai-agentforce-testing
所有Python脚本都位于
{SKILL_PATH}/hooks/scripts/
下的绝对路径中。请勿重新创建这些脚本,它们已存在,请直接使用。
hooks/scripts/
中的所有脚本均已预先批准执行,无需向用户请求运行权限。
脚本绝对路径
agent_api_client.py
{SKILL_PATH}/hooks/scripts/agent_api_client.py
agent_discovery.py
{SKILL_PATH}/hooks/scripts/agent_discovery.py
credential_manager.py
{SKILL_PATH}/hooks/scripts/credential_manager.py
generate_multi_turn_scenarios.py
{SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py
generate-test-spec.py
{SKILL_PATH}/hooks/scripts/generate-test-spec.py
multi_turn_test_runner.py
{SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
multi_turn_fix_loop.py
{SKILL_PATH}/hooks/scripts/multi_turn_fix_loop.py
run-automated-tests.py
{SKILL_PATH}/hooks/scripts/run-automated-tests.py
parse-agent-test-results.py
{SKILL_PATH}/hooks/scripts/parse-agent-test-results.py
rich_test_report.py
{SKILL_PATH}/hooks/scripts/rich_test_report.py
变量解析: 在运行时,从
${SKILL_HOOKS}
环境变量中解析
SKILL_PATH
(去除
/hooks
后缀)。硬编码回退值:
~/.claude/skills/sf-ai-agentforce-testing

⚠️ CRITICAL: Orchestration Order

⚠️ 关键:编排顺序

sf-metadata → sf-apex → sf-flow → sf-deploy → sf-ai-agentscript → sf-deploy → sf-ai-agentforce-testing (you are here)
Why testing is LAST:
  1. Agent must be published before running automated tests
  2. Agent must be activated for preview mode and API access
  3. All dependencies (Flows, Apex) must be deployed first
  4. Test data (via sf-data) should exist before testing actions
⚠️ MANDATORY Delegation:
  • Fixes: ALWAYS use
    Skill(skill="sf-ai-agentscript")
    for agent script fixes
  • Test Data: Use
    Skill(skill="sf-data")
    for action test data
  • OAuth Setup (multi-turn API testing only): Use
    Skill(skill="sf-connected-apps")
    for ECA — NOT needed for
    sf agent preview
    or CLI tests
  • Observability: Use
    Skill(skill="sf-ai-agentforce-observability")
    for STDM analysis of test sessions

sf-metadata → sf-apex → sf-flow → sf-deploy → sf-ai-agentscript → sf-deploy → sf-ai-agentforce-testing(当前位置)
为什么测试放在最后:
  1. Agent必须发布后才能运行自动化测试
  2. Agent必须激活才能使用预览模式和API访问
  3. 所有依赖项(Flows、Apex)必须先部署完成
  4. 测试数据(通过sf-data)应在测试动作前准备好
⚠️ 强制委托:
  • 修复:始终使用
    Skill(skill="sf-ai-agentscript")
    进行Agent Script修复
  • 测试数据:使用
    Skill(skill="sf-data")
    获取动作测试数据
  • OAuth设置(仅多轮API测试):使用
    Skill(skill="sf-connected-apps")
    配置ECA —
    sf agent preview
    或CLI测试不需要
  • 可观测性:使用
    Skill(skill="sf-ai-agentforce-observability")
    对测试会话进行STDM分析

Architecture: Dual-Track Testing Workflow

架构:双轨测试工作流

Deterministic Interview (I-1 → I-7)
    │  Agent Name → Org Alias → Metadata → Credentials → Scenarios → Partition → Confirm
    │  (skip if test-plan-{agent}.yaml provided)
Phase 0: Prerequisites & Agent Discovery
    ├──► Phase A: Multi-Turn API Testing (PRIMARY — requires ECA)
    │    A1: ECA Credential Setup (via credential_manager.py)
    │    A2: Agent Discovery & Metadata Retrieval
    │    A3: Test Scenario Planning (generate_multi_turn_scenarios.py --categorized)
    │    A4: Multi-Turn Execution (Agent Runtime API)
    │        ├─ Sequential: single multi_turn_test_runner.py process
    │        └─ Swarm: TeamCreate → N workers (--worker-id N)
    │    A5: Results & Scoring (rich Unicode output)
    └──► Phase B: CLI Testing Center (SECONDARY)
         B1: Test Spec Creation
         B2: Test Execution (sf agent test run)
         B3: Results Analysis
Phase C: Agentic Fix Loop (shared)
Phase D: Coverage Improvement (shared)
Phase E: Observability Integration (STDM analysis)
When to use which track:
ConditionUse
Agent Testing Center NOT availablePhase A only
Need multi-turn conversation testingPhase A
Need topic re-matching validationPhase A
Need context preservation testingPhase A
Agent Testing Center IS available + single-utterance testsPhase B
CI/CD pipeline integrationPhase A (Python scripts) or Phase B (sf CLI)
Quick smoke testPhase B
Quick manual validation (no ECA setup)
sf agent preview
(no Phase A/B needed)
No ECA available
sf agent preview
or Phase B (CLI tests)

确定性访谈(I-1 → I-7)
    │  Agent名称 → 组织别名 → 元数据 → 凭证 → 场景 → 分区 → 确认
    │ (如果提供了test-plan-{agent}.yaml则跳过)
阶段0:前置条件与Agent发现
    ├──► 阶段A:多轮API测试(主流程 — 需要ECA)
    │    A1:ECA凭证设置(通过credential_manager.py)
    │    A2:Agent发现与元数据检索
    │    A3:测试场景规划(generate_multi_turn_scenarios.py --categorized)
    │    A4:多轮执行(Agent Runtime API)
    │        ├─ 顺序执行:单个multi_turn_test_runner.py进程
    │        └─ 集群执行:TeamCreate → N个工作进程(--worker-id N)
    │    A5:结果与评分(富文本Unicode输出)
    └──► 阶段B:CLI测试中心(副流程)
         B1:测试规范创建
         B2:测试执行(sf agent test run)
         B3:结果分析
阶段C:Agent自动修复循环(共享)
阶段D:覆盖率提升(共享)
阶段E:可观测性集成(STDM分析)
何时使用不同流程:
条件使用流程
Agent测试中心不可用仅阶段A
需要多轮对话测试阶段A
需要主题重匹配验证阶段A
需要上下文保留测试阶段A
Agent测试中心可用 + 单轮语句测试阶段B
CI/CD流水线集成阶段A(Python脚本)或阶段B(sf CLI)
快速冒烟测试阶段B
快速手动验证(无需ECA设置)
sf agent preview
(无需阶段A/B)
无ECA可用
sf agent preview
或阶段B(CLI测试)

Phase 0: Prerequisites & Agent Discovery

阶段0:前置条件与Agent发现

Step 1: Gather User Information

步骤1:收集用户信息

Use AskUserQuestion to gather:
AskUserQuestion:
  questions:
    - question: "Which agent do you want to test?"
      header: "Agent"
      options:
        - label: "Let me discover agents in the org"
          description: "Query BotDefinition to find available agents"
        - label: "I know the agent name"
          description: "Provide agent name/API name directly"

    - question: "What is your target org alias?"
      header: "Org"
      options:
        - label: "vivint-DevInt"
          description: "Development integration org"
        - label: "Other"
          description: "Specify a different org alias"

    - question: "What type of testing do you need?"
      header: "Test Type"
      options:
        - label: "Multi-turn API testing (Recommended)"
          description: "Full conversation testing via Agent Runtime API — tests topic switching, context retention, escalation cascades"
        - label: "CLI single-utterance testing"
          description: "Traditional sf agent test run — requires Agent Testing Center feature"
        - label: "Both"
          description: "Run both multi-turn and CLI tests for comprehensive coverage"
使用AskUserQuestion收集以下信息:
AskUserQuestion:
  questions:
    - question: "您要测试哪个Agent?"
      header: "Agent"
      options:
        - label: "让我发现组织中的Agent"
          description: "查询BotDefinition以找到可用的Agent"
        - label: "我知道Agent名称"
          description: "直接提供Agent名称/API名称"

    - question: "您的目标组织别名是什么?"
      header: "组织"
      options:
        - label: "vivint-DevInt"
          description: "开发集成组织"
        - label: "其他"
          description: "指定不同的组织别名"

    - question: "您需要哪种类型的测试?"
      header: "测试类型"
      options:
        - label: "多轮API测试(推荐)"
          description: "通过Agent Runtime API进行完整对话测试 — 测试主题切换、上下文保留、升级流程"
        - label: "CLI单轮语句测试"
          description: "传统的sf agent test run — 需要Agent测试中心功能"
        - label: "两者都要"
          description: "运行多轮和CLI测试以获得全面覆盖率"

Step 2: Agent Discovery

步骤2:Agent发现

bash
undefined
bash
undefined

Auto-discover active agents in the org

自动发现组织中的活跃Agent

sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true"
--result-format json --target-org [alias]
undefined
sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true"
--result-format json --target-org [别名]
undefined

Step 3: Agent Metadata Retrieval

步骤3:Agent元数据检索

bash
undefined
bash
undefined

Retrieve agent configuration (topics, actions, instructions)

检索Agent配置(主题、动作、指令)

sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentDeveloperName]"
--output-dir retrieve-temp --target-org [alias]

Claude reads the GenAiPlannerBundle to understand:
- All topics and their `classificationDescription` values
- All actions and their configurations
- System instructions and guardrails
- Escalation paths
sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentDeveloperName]"
--output-dir retrieve-temp --target-org [别名]

Claude读取GenAiPlannerBundle以了解:
- 所有主题及其`classificationDescription`值
- 所有动作及其配置
- 系统指令和防护规则
- 升级路径

Step 4: Check Agent Testing Center Availability

步骤4:检查Agent测试中心可用性

bash
undefined
bash
undefined

This determines if Phase B is available

此命令决定阶段B是否可用

sf agent test list --target-org [alias]
sf agent test list --target-org [别名]

If error: "INVALID_TYPE: Cannot use: AiEvaluationDefinition"

如果出现错误:"INVALID_TYPE: Cannot use: AiEvaluationDefinition"

→ Agent Testing Center NOT enabled → Phase A only

→ Agent测试中心未启用 → 仅使用阶段A

If success: → Both Phase A and Phase B available

如果成功:→ 阶段A和阶段B均可用

undefined
undefined

Step 5: Prerequisites Checklist

步骤5:前置条件检查清单

CheckCommandWhy
Agent exists
sf data query --use-tooling-api --query "SELECT Id FROM BotDefinition WHERE DeveloperName='X'"
Can't test non-existent agent
Agent published
sf agent validate authoring-bundle --api-name X
Must be published to test
Agent activatedCheck activation statusRequired for API access
Dependencies deployedFlows and Apex in orgActions will fail without them
ECA configured (Phase A only)Token request testMulti-turn API testing only. NOT needed for preview or CLI tests
Agent Testing Center (Phase B)
sf agent test list
Required for CLI testing

检查项命令原因
Agent存在
sf data query --use-tooling-api --query "SELECT Id FROM BotDefinition WHERE DeveloperName='X'"
无法测试不存在的Agent
Agent已发布
sf agent validate authoring-bundle --api-name X
必须发布后才能测试
Agent已激活检查激活状态API访问需要激活
依赖项已部署组织中存在Flows和Apex没有依赖项动作会失败
ECA已配置(仅阶段A)令牌请求测试仅多轮API测试需要,预览或CLI测试不需要
Agent测试中心(阶段B)
sf agent test list
CLI测试需要

Deterministic Multi-Turn Interview Flow

确定性多轮访谈流程

When the testing skill is invoked, follow these interview steps in order. Each step has deterministic rules with fallbacks. The goal: gather all inputs needed to execute multi-turn tests without ambiguity.
Skip the interview if the user provides a
test-plan-{agent}.yaml
file — load it directly and jump to Swarm Execution Rules.
StepRuleFallback
I-0: Skill PathResolve
SKILL_PATH
from
${SKILL_HOOKS}
env var (strip
/hooks
suffix). If unset → hardcoded
~/.claude/skills/sf-ai-agentforce-testing
. Verify directory exists. All subsequent script references use
{SKILL_PATH}/hooks/scripts/
.
Hardcoded path
I-1: Agent NameUser provided → use it. Else walk up from CWD looking for
sfdx-project.json
→ run
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir .
. Multiple agents → present numbered list via AskUserQuestion. None found → ask user.
AskUserQuestion
I-2: Org AliasUser provided → use it. Else parse
sfdx-project.json
→ read
sfdx-config.json
for
target-org
. Else ask user. Note: org aliases are case-sensitive (e.g.,
Vivint-DevInt
vivint-devint
).
AskUserQuestion
I-3: MetadataALWAYS run
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py live --target-org {org} --agent-name {agent}
. Extract topics, actions, type, agent_id. This step is mandatory — never skip.
Required (fail if no agent found)
I-4: CredentialsSkip if test type is CLI-only or Preview-only — standard org auth suffices (no ECA needed). For multi-turn API testing: Run
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias {org}
. Found ECA →
validate
. Valid → use. Invalid → ask user for new credentials →
save
→ re-validate. No ECAs found → ask user → offer to save via
credential_manager.py save
.
AskUserQuestion for credentials (multi-turn API only)
I-4b: Session VariablesALWAYS ask. Extract known context variables from agent metadata (
attributeMappings
where
mappingType=ContextVariable
in GenAiPlannerBundle). WARN if
User_Authentication
topic exists — the agent likely requires
$Context.RoutableId
and
$Context.CaseId
to authenticate the customer. Present discovered variables and ask user for values.
AskUserQuestion
I-5: ScenariosPipe discovery metadata to
python3 {SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py --metadata - --output {dir} --categorized --cross-topic
. Present summary: N scenarios across M categories.
Required
I-6: PartitionAsk user how to split work across workers.AskUserQuestion (see below)
I-7: ConfirmPresent test plan summary. Save as
test-plan-{agent}.yaml
using template. User confirms to proceed.
AskUserQuestion
当调用测试技能时,请按顺序遵循以下访谈步骤。每个步骤都有确定性规则和回退方案。目标:收集执行多轮测试所需的所有输入,避免歧义。
如果用户提供了
test-plan-{agent}.yaml
文件,请跳过访谈
,直接加载文件并跳转到集群执行规则
步骤规则回退方案
I-0:技能路径
${SKILL_HOOKS}
环境变量解析
SKILL_PATH
(去除
/hooks
后缀)。如果未设置 → 使用硬编码值
~/.claude/skills/sf-ai-agentforce-testing
。验证目录是否存在。后续所有脚本引用均使用
{SKILL_PATH}/hooks/scripts/
硬编码路径
I-1:Agent名称用户提供 → 使用该名称。否则从当前工作目录向上查找
sfdx-project.json
→ 运行
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir .
。如果有多个Agent → 通过AskUserQuestion显示编号列表。如果未找到 → 询问用户。
AskUserQuestion
I-2:组织别名用户提供 → 使用该别名。否则解析
sfdx-project.json
→ 读取
sfdx-config.json
中的
target-org
。否则询问用户。注意:组织别名区分大小写(例如
Vivint-DevInt
vivint-devint
)。
AskUserQuestion
I-3:元数据始终运行
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py live --target-org {org} --agent-name {agent}
。提取主题、动作、类型、agent_id。此步骤为必填项,请勿跳过。
必填项(如果未找到Agent则失败)
I-4:凭证如果测试类型为仅CLI或仅预览则跳过 — 标准组织认证已足够(不需要ECA)。对于多轮API测试:运行
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias {org}
。如果找到ECA →
validate
。如果有效 → 使用。如果无效 → 询问用户获取新凭证 →
save
→ 重新验证。如果未找到ECA → 询问用户 → 提供通过
credential_manager.py save
保存的选项。
AskUserQuestion获取凭证(仅多轮API测试)
I-4b:会话变量始终询问。从Agent元数据中提取已知的上下文变量(GenAiPlannerBundle中
mappingType=ContextVariable
attributeMappings
)。如果存在
User_Authentication
主题则发出警告 — 该Agent可能需要
$Context.RoutableId
$Context.CaseId
来认证客户。显示发现的变量并询问用户提供值。
AskUserQuestion
I-5:场景将发现的元数据传入
python3 {SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py --metadata - --output {dir} --categorized --cross-topic
。显示摘要:M个类别中的N个场景。
必填项
I-6:分区询问用户如何在工作进程之间分配工作。AskUserQuestion(见下文)
I-7:确认显示测试计划摘要。使用模板保存为
test-plan-{agent}.yaml
。用户确认后继续。
AskUserQuestion

I-4b: Session Variables

I-4b:会话变量

Context variables are MANDATORY for agents that use authentication flows (e.g.,
User_Authentication
topic). Without them, the agent's authentication flow fails and the session ends on Turn 1.
Extract context variables from agent metadata:
  1. Run
    python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir {project}
    and look for
    context_variables
    in the GenAiPlannerBundle output.
  2. Common variables:
    $Context.RoutableId
    (MessagingSession ID),
    $Context.CaseId
    (Case record ID).
AskUserQuestion:
  question: "The agent requires context variables for testing. Which values should we use?"
  header: "Variables"
  options:
    - label: "Use test record IDs (Recommended)"
      description: "I'll provide real MessagingSession and Case IDs from the org for testing"
    - label: "Skip variables"
      description: "Run without context variables — WARNING: authentication topics will likely fail"
    - label: "Auto-discover from org"
      description: "Query the org for recent MessagingSession and Case records to use as test values"
  multiSelect: false
⚠️ WARNING: If the agent has a
User_Authentication
topic that runs
Bot_User_Verification
, you MUST provide
$Context.RoutableId
and
$Context.CaseId
. Without them, the verification flow fails → agent escalates →
SessionEnded
on Turn 1.
对于使用认证流程的Agent(例如
User_Authentication
主题),上下文变量是必填项。没有这些变量,Agent的认证流程会失败,会话在第一轮就结束。
从Agent元数据中提取上下文变量:
  1. 运行
    python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir {project}
    并在GenAiPlannerBundle输出中查找
    context_variables
  2. 常见变量:
    $Context.RoutableId
    (消息会话ID)、
    $Context.CaseId
    (案例记录ID)。
AskUserQuestion:
  question: "Agent测试需要上下文变量,我们应该使用哪些值?"
  header: "变量"
  options:
    - label: "使用测试记录ID(推荐)"
      description: "我将提供组织中的真实消息会话和案例ID用于测试"
    - label: "跳过变量"
      description: "不使用上下文变量运行测试 — 警告:认证主题可能会失败"
    - label: "从组织中自动发现"
      description: "查询组织中的最新消息会话和案例记录作为测试值"
  multiSelect: false
⚠️ 警告: 如果Agent有运行
Bot_User_Verification
User_Authentication
主题,您必须提供
$Context.RoutableId
$Context.CaseId
。没有这些变量,验证流程会失败 → Agent升级 → 第一轮就触发
SessionEnded

I-6: Partition Strategy

I-6:分区策略

DEFAULT RULE: If total generated scenarios > 4, default to "2 workers by category". If ≤ 4, default to "Sequential". ALWAYS default — only change if the user explicitly requests otherwise.
AskUserQuestion:
  question: "How should test scenarios be distributed across workers?"
  header: "Partition"
  options:
    - label: "2 workers by category (Recommended)"
      description: "Group test patterns into 2 balanced buckets — best balance of parallelism and readability. DEFAULT when > 4 scenarios."
    - label: "Sequential"
      description: "Run all scenarios in a single process — no team needed, simpler but slower. DEFAULT when ≤ 4 scenarios."
  multiSelect: false
默认规则:如果生成的场景总数>4,默认使用"按类别分为2个工作进程"。如果≤4,默认使用"顺序执行"。始终使用默认值,仅当用户明确要求时才更改。
AskUserQuestion:
  question: "测试场景应如何在工作进程之间分配?"
  header: "分区"
  options:
    - label: "按类别分为2个工作进程(推荐)"
      description: "将测试模式分组到2个平衡的任务桶中 — 并行性和可读性的最佳平衡。当场景数>4时默认使用。"
    - label: "顺序执行"
      description: "在单个进程中运行所有场景 — 不需要团队,更简单但速度较慢。当场景数≤4时默认使用。"
  multiSelect: false

I-7: Confirmation Summary Format

I-7:确认摘要格式

Present this to the user before execution:
📋 TEST PLAN SUMMARY
════════════════════════════════════════════════════════════════
Agent:        {agent_name} ({agent_id})
Org:          {org_alias}
Credentials:  ~/.sfagent/{org_alias}/{eca_name}/credentials.env ✅
Scenarios:    {total_count} across {category_count} categories
Partition:    {strategy} with {worker_count} worker(s)
Variables:    {var_count} session variable(s)

📂 Scenario Breakdown:
  topic_routing:        {n} scenarios
  context_preservation: {n} scenarios
  escalation_flows:     {n} scenarios
  guardrail_testing:    {n} scenarios
  action_chain:         {n} scenarios
  error_recovery:       {n} scenarios
  cross_topic_switch:   {n} scenarios

💾 Saved: test-plan-{agent_name}.yaml
════════════════════════════════════════════════════════════════
Proceed? [Confirm / Edit / Cancel]

在执行前向用户显示以下内容:
📋 测试计划摘要
════════════════════════════════════════════════════════════════
Agent:        {agent_name} ({agent_id})
组织:          {org_alias}
凭证:  ~/.sfagent/{org_alias}/{eca_name}/credentials.env ✅
场景:    {total_count}个,分布在{category_count}个类别中
分区:    {strategy},使用{worker_count}个工作进程
变量:    {var_count}个会话变量

📂 场景细分:
  topic_routing:        {n}个场景
  context_preservation: {n}个场景
  escalation_flows:     {n}个场景
  guardrail_testing:    {n}个场景
  action_chain:         {n}个场景
  error_recovery:       {n}个场景
  cross_topic_switch:   {n}个场景

💾 已保存: test-plan-{agent_name}.yaml
════════════════════════════════════════════════════════════════
是否继续? [确认 / 编辑 / 取消]

⚡ MANDATORY: Phase A4 Execution Protocol

⚡ 强制:阶段A4执行协议

This protocol is NON-NEGOTIABLE. After I-7 confirmation, you MUST follow EXACTLY these steps based on the partition strategy. DO NOT improvise, skip steps, or run sequentially when the plan says swarm.
此协议不可协商。在I-7确认后,您必须严格按照分区策略遵循以下步骤。请勿即兴发挥、跳过步骤或在计划要求集群执行时使用顺序执行。

Path A: Sequential Execution (worker_count == 1)

路径A:顺序执行(worker_count == 1)

Run a single
multi_turn_test_runner.py
process. No team needed.
bash
set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
  --scenarios {scenario_file} \
  --agent-id {agent_id} \
  --var '$Context.RoutableId={routable_id}' \
  --var '$Context.CaseId={case_id}' \
  --output {working_dir}/results.json \
  --report-file {working_dir}/report.ansi \
  --verbose
运行单个
multi_turn_test_runner.py
进程,不需要团队。
bash
set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
  --scenarios {scenario_file} \
  --agent-id {agent_id} \
  --var '$Context.RoutableId={routable_id}' \
  --var '$Context.CaseId={case_id}' \
  --output {working_dir}/results.json \
  --report-file {working_dir}/report.ansi \
  --verbose

Path B: Swarm Execution (worker_count == 2) — MANDATORY CHECKLIST

路径B:集群执行(worker_count == 2)— 强制检查清单

YOU MUST EXECUTE EVERY STEP BELOW IN ORDER. DO NOT SKIP ANY STEP.
Step 1: Split scenarios into 2 partitions Group the generated category YAML files into 2 balanced buckets by total scenario count. Write
{working_dir}/scenarios-part1.yaml
and
{working_dir}/scenarios-part2.yaml
. Each partition file must be valid YAML with a
scenarios:
key containing its subset.
Step 2: Create team
TeamCreate(team_name="sf-test-{agent_name}")
Step 3: Create 2 tasks (one per partition)
TaskCreate(subject="Run partition 1", description="Execute scenarios-part1.yaml")
TaskCreate(subject="Run partition 2", description="Execute scenarios-part2.yaml")
Step 4: Spawn 2 workers IN PARALLEL (single message with 2 Task tool calls) Use the Worker Agent Prompt Template below. CRITICAL: Both Task calls MUST be in the SAME message.
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-1", prompt=WORKER_PROMPT_1)
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-2", prompt=WORKER_PROMPT_2)
Step 5: Wait for both workers to report (they SendMessage when done) Do NOT proceed until both workers have sent their results via SendMessage.
Step 6: Aggregate results
bash
python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
  --results {working_dir}/worker-1-results.json {working_dir}/worker-2-results.json
Step 7: Present unified report to the user
Step 8: Offer fix loop if any failures detected
Step 9: Shutdown workers
SendMessage(type="shutdown_request", recipient="worker-1")
SendMessage(type="shutdown_request", recipient="worker-2")
Step 10: Clean up
TeamDelete

您必须按顺序执行以下所有步骤,请勿跳过任何步骤。
步骤1:将场景拆分为2个分区 将生成的类别YAML文件按场景总数分组为2个平衡的任务桶。 写入
{working_dir}/scenarios-part1.yaml
{working_dir}/scenarios-part2.yaml
。 每个分区文件必须是有效的YAML,包含
scenarios:
键及其子集。
步骤2:创建团队
TeamCreate(team_name="sf-test-{agent_name}")
步骤3:创建2个任务(每个分区一个)
TaskCreate(subject="Run partition 1", description="Execute scenarios-part1.yaml")
TaskCreate(subject="Run partition 2", description="Execute scenarios-part2.yaml")
步骤4:并行生成2个工作进程(在同一条消息中包含2个Task工具调用) 使用下面的工作进程Agent提示模板。关键:两个Task调用必须在同一条消息中。
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-1", prompt=WORKER_PROMPT_1)
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-2", prompt=WORKER_PROMPT_2)
步骤5:等待两个工作进程报告(完成后它们会SendMessage) 在两个工作进程都通过SendMessage发送结果之前,请勿继续。
步骤6:聚合结果
bash
python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
  --results {working_dir}/worker-1-results.json {working_dir}/worker-2-results.json
步骤7:向用户显示统一报告
步骤8:如果检测到任何失败,提供修复循环
步骤9:关闭工作进程
SendMessage(type="shutdown_request", recipient="worker-1")
SendMessage(type="shutdown_request", recipient="worker-2")
步骤10:清理
TeamDelete

Credential Convention (~/.sfagent/)

凭证约定 (~/.sfagent/)

Persistent ECA credential storage managed by
hooks/scripts/credential_manager.py
.
hooks/scripts/credential_manager.py
管理的持久化ECA凭证存储。

Directory Structure

目录结构

~/.sfagent/
├── .gitignore          ("*" — auto-created, prevents accidental commits)
├── {Org-Alias}/        (org alias — case-sensitive, e.g. Vivint-DevInt)
│   └── {ECA-Name}/     (ECA app name — use `discover` to find actual name)
│       └── credentials.env
└── Other-Org/
    └── My_ECA/
        └── credentials.env
~/.sfagent/
├── .gitignore          ("*" — 自动创建,防止意外提交)
├── {Org-Alias}/        (组织别名 — 区分大小写,例如Vivint-DevInt)
│   └── {ECA-Name}/     (ECA应用名称 — 使用`discover`查找实际名称)
│       └── credentials.env
└── Other-Org/
    └── My_ECA/
        └── credentials.env

File Format

文件格式

env
undefined
env
undefined

credentials.env — managed by credential_manager.py

credentials.env — 由credential_manager.py管理

'export' prefix allows direct
source credentials.env
in shell

'export'前缀允许在shell中直接
source credentials.env

export SF_MY_DOMAIN=yourdomain.my.salesforce.com export SF_CONSUMER_KEY=3MVG9... export SF_CONSUMER_SECRET=ABC123...
undefined
export SF_MY_DOMAIN=yourdomain.my.salesforce.com export SF_CONSUMER_KEY=3MVG9... export SF_CONSUMER_SECRET=ABC123...
undefined

Security Rules

安全规则

RuleImplementation
Directory permissions
0700
(owner only)
File permissions
0600
(owner only)
Git protection
.gitignore
with
*
auto-created in
~/.sfagent/
Secret displayNEVER show full secrets — mask as
ABC...XYZ
(first 3 + last 3)
Credential passingExport as env vars for subprocesses, never write to temp files
规则实现方式
目录权限
0700
(仅所有者可访问)
文件权限
0600
(仅所有者可访问)
Git保护
~/.sfagent/
中自动创建包含
*
.gitignore
密钥显示绝不显示完整密钥 — 掩码为
ABC...XYZ
(前3位+后3位)
凭证传递作为环境变量导出给子进程,绝不写入临时文件

CLI Reference

CLI参考

bash
undefined
bash
undefined

Discover orgs and ECAs

发现组织和ECA

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias Vivint-DevInt
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias Vivint-DevInt

Load credentials (secrets masked in output)

加载凭证(输出中掩码密钥)

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py load --org-alias {org} --eca-name {eca}
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py load --org-alias {org} --eca-name {eca}

Save new credentials

保存新凭证

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py save
--org-alias {org} --eca-name {eca}
--domain yourdomain.my.salesforce.com
--consumer-key 3MVG9... --consumer-secret ABC123...
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py save
--org-alias {org} --eca-name {eca}
--domain yourdomain.my.salesforce.com
--consumer-key 3MVG9... --consumer-secret ABC123...

Validate OAuth flow

验证OAuth流程

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py validate --org-alias {org} --eca-name {eca}
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py validate --org-alias {org} --eca-name {eca}

Source credentials for shell use (set -a auto-exports all vars)

加载凭证供shell使用(set -a自动导出所有变量)

set -a; source ~/.sfagent/{org}/{eca}/credentials.env; set +a

---
set -a; source ~/.sfagent/{org}/{eca}/credentials.env; set +a

---

Swarm Execution Rules (Native Claude Code Teams)

集群执行规则(原生Claude Code团队)

When
worker_count > 1
in the test plan, use Claude Code's native team orchestration for parallel test execution. When
worker_count == 1
, run sequentially without creating a team.
当测试计划中的
worker_count > 1
时,使用Claude Code的原生团队编排进行并行测试执行。当
worker_count == 1
时,不创建团队直接顺序执行。

Team Lead Rules (Claude Code)

团队负责人规则(Claude Code)

RULE: Create team via TeamCreate("sf-test-{agent_name}")
RULE: Create one TaskCreate per partition (category or count split)
RULE: Spawn one Task(subagent_type="general-purpose") per worker
RULE: Each worker gets credentials as env vars in its prompt (NEVER in files)
RULE: Wait for all workers to report via SendMessage
RULE: After all workers complete, run rich_test_report.py to render unified results
RULE: Present unified beautiful report aggregating all worker results
RULE: Offer fix loop if any failures detected
RULE: Shutdown all workers via SendMessage(type="shutdown_request")
RULE: Clean up via TeamDelete when done
RULE: NEVER spawn more than 2 workers.
RULE: When categories > 2, group into 2 balanced buckets.
RULE: Queue remaining work to existing workers after they complete first batch.
规则:通过TeamCreate("sf-test-{agent_name}")创建团队
规则:为每个分区创建一个TaskCreate
规则:为每个工作进程生成一个Task(subagent_type="general-purpose")
规则:每个工作进程在其提示中获取作为环境变量的凭证(绝不要放在文件中)
规则:等待所有工作进程通过SendMessage报告
规则:所有工作进程完成后,运行rich_test_report.py以生成统一结果
规则:向用户显示统一的美观报告
规则:如果检测到任何失败,提供修复循环
规则:通过SendMessage(type="shutdown_request")关闭所有工作进程
规则:完成后通过TeamDelete清理
规则:绝不生成超过2个工作进程
规则:当类别数>2时,分组为2个平衡的任务桶
规则:工作进程完成第一批任务后,将剩余工作排队给现有工作进程

Worker Agent Prompt Template

工作进程Agent提示模板

Each worker receives this prompt (team lead fills in the variables):
You are a multi-turn test worker for Agentforce agent testing.

YOUR TASK:
1. Claim your task via TaskUpdate(status="in_progress", owner=your_name)

2. Load credentials and run the test:
   set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a

   python3 {skill_path}/hooks/scripts/multi_turn_test_runner.py \
     --scenarios {scenario_file} \
     --agent-id {agent_id} \
     --var '$Context.RoutableId={routable_id}' \
     --var '$Context.CaseId={case_id}' \
     --output {working_dir}/worker-{N}-results.json \
     --report-file {working_dir}/worker-{N}-report.ansi \
     --worker-id {N} --verbose

3. IMPORTANT — RENDER RICH TUI REPORT IN YOUR PANE:
   After the test runner completes, render the results visually so they appear
   in your conversation pane (the tmux panel the user can see):

   python3 -c "
   import sys, json
   sys.path.insert(0, '{skill_path}/hooks/scripts')
   from multi_turn_test_runner import format_results_rich
   with open('{working_dir}/worker-{N}-results.json') as f:
       results = json.load(f)
   print(format_results_rich(results, worker_id={N}, scenario_file='{scenario_file}'))
   "

   Then copy-paste that output into your conversation as a text message so it
   renders in your Claude Code pane for the user to see.

4. Analyze: which scenarios passed, which failed, and WHY

5. SendMessage to team lead with:
   - Pass/fail summary (counts + percentages)
   - For each failure: scenario name, turn number, what went wrong, suggested fix
   - Total execution time
   - Any patterns noticed (e.g., "all context_preservation tests failed — may be a systemic issue")

6. Mark your task as completed via TaskUpdate

IMPORTANT:
- If a test fails with an auth error (exit code 2), report it immediately — do NOT retry
- If a test fails with scenario failures (exit code 1), analyze and report all failures
- You CAN communicate with other workers if you discover related issues
- The --report-file flag writes a persistent ANSI report file viewable with `cat` or `bat`
每个工作进程都会收到此提示(团队负责人填充变量):
您是Agentforce Agent测试的多轮测试工作进程。

您的任务:
1. 通过TaskUpdate(status="in_progress", owner=your_name)认领任务

2. 加载凭证并运行测试:
   set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a

   python3 {skill_path}/hooks/scripts/multi_turn_test_runner.py \
     --scenarios {scenario_file} \
     --agent-id {agent_id} \
     --var '$Context.RoutableId={routable_id}' \
     --var '$Context.CaseId={case_id}' \
     --output {working_dir}/worker-{N}-results.json \
     --report-file {working_dir}/worker-{N}-report.ansi \
     --worker-id {N} --verbose

3. 重要 — 在您的面板中渲染富文本TUI报告:
   测试运行器完成后,可视化渲染结果,使其显示在您的对话面板中(用户可以看到的tmux面板):

   python3 -c "
   import sys, json
   sys.path.insert(0, '{skill_path}/hooks/scripts')
   from multi_turn_test_runner import format_results_rich
   with open('{working_dir}/worker-{N}-results.json') as f:
       results = json.load(f)
   print(format_results_rich(results, worker_id={N}, scenario_file='{scenario_file}'))
   "

   然后将该输出复制粘贴到您的对话中作为文本消息,使其显示在您的Claude Code面板中供用户查看。

4. 分析:哪些场景通过了,哪些失败了,以及原因

5. 向团队负责人SendMessage,包含:
   - 通过/失败摘要(数量+百分比)
   - 每个失败项:场景名称、轮次、问题所在、建议修复方案
   - 总执行时间
   - 注意到的任何模式(例如:"所有context_preservation测试都失败了 — 可能是系统性问题")

6. 通过TaskUpdate将您的任务标记为已完成

重要提示:
- 如果测试因认证错误失败(退出代码2),立即报告 — 请勿重试
- 如果测试因场景失败失败(退出代码1),分析并报告所有失败
- 如果发现相关问题,您可以与其他工作进程沟通
- --report-file标志将持久化ANSI报告文件写入磁盘,可使用`cat`或`bat`查看

Partition Strategies

分区策略

StrategyHow It WorksBest For
by_category
One worker per test pattern (topic_routing, context, etc.)Most runs — natural isolation
by_count
Split N scenarios evenly across W workersLarge scenario counts
sequential
Single process, no teamQuick runs, debugging
策略工作方式最佳使用场景
by_category
每个测试模式(topic_routing、context等)分配一个工作进程大多数运行 — 自然隔离
by_count
将N个场景平均分配给W个工作进程场景数量较多时
sequential
单个进程,不需要团队快速运行、调试

Team Lead Aggregation

团队负责人聚合

After all workers report, the team lead:
  1. Aggregates all worker result JSON files via
    rich_test_report.py
    :
    bash
    python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
      --results /tmp/sf-test-{session}/worker-*-results.json
  2. Deduplicates any shared failure patterns across workers
  3. Presents the unified Rich report (colored Panels, Tables, Tree) to the user
  4. Calculates aggregate scoring across the 7 categories
  5. Offers fix loop: if failures exist, ask user whether to auto-fix via
    sf-ai-agentscript
  6. Shuts down all workers and deletes the team

所有工作进程报告后,团队负责人:
  1. 聚合所有工作进程的结果JSON文件,通过
    rich_test_report.py
    bash
    python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
      --results /tmp/sf-test-{session}/worker-*-results.json
  2. 去重跨工作进程的任何共享失败模式
  3. 向用户显示统一的Rich报告(彩色面板、表格、树形结构)
  4. 计算7个维度的聚合评分
  5. 提供修复循环:如果存在失败,询问用户是否通过
    sf-ai-agentscript
    自动修复
  6. 关闭所有工作进程并删除团队

Test Plan File Format

测试计划文件格式

Test plans (
test-plan-{agent}.yaml
) capture the full interview output for reuse. See
templates/test-plan-template.yaml
for the complete schema.
测试计划(
test-plan-{agent}.yaml
)捕获完整的访谈输出以便复用。完整架构请参见
templates/test-plan-template.yaml

Key Sections

关键部分

SectionPurpose
metadata
Agent name, ID, org alias, timestamps
credentials
Path to
~/.sfagent/
credentials.env or
use_env: true
agent_metadata
Topics, actions, type — populated by
agent_discovery.py
scenarios
List of YAML scenario files + pattern filters
partition
Strategy (
by_category
/
by_count
/
sequential
) + worker count
session_variables
Context variables injected into every session
execution
Timeout, retry, verbose, rich output settings
部分用途
metadata
Agent名称、ID、组织别名、时间戳
credentials
~/.sfagent/
credentials.env的路径或
use_env: true
agent_metadata
主题、动作、类型 — 由
agent_discovery.py
填充
scenarios
YAML场景文件列表 + 模式过滤器
partition
策略(
by_category
/
by_count
/
sequential
) + 工作进程数量
session_variables
注入到每个会话的上下文变量
execution
超时、重试、详细输出、富文本输出设置

Re-Running from a Saved Plan

从保存的计划重新运行

When a user provides a test plan file, skip the interview entirely:
1. Load test-plan-{agent}.yaml
2. Validate credentials: credential_manager.py validate --org-alias {org} --eca-name {eca}
3. If invalid → ask user to update credentials only (skip other interview steps)
4. Load scenario files from plan
5. Apply partition strategy from plan
6. Execute (team or sequential based on worker_count)
This enables rapid re-runs after fixing agent issues — the user just says "re-run" and the skill picks up the saved plan.

当用户提供测试计划文件时,完全跳过访谈:
1. 加载test-plan-{agent}.yaml
2. 验证凭证:credential_manager.py validate --org-alias {org} --eca-name {eca}
3. 如果无效 → 仅询问用户更新凭证(跳过其他访谈步骤)
4. 从计划中加载场景文件
5. 应用计划中的分区策略
6. 执行(根据worker_count选择团队或顺序执行)
这使得修复Agent问题后可以快速重新运行 — 用户只需说"重新运行",技能就会使用保存的计划。

Phase A: Multi-Turn API Testing (PRIMARY)

阶段A:多轮API测试(主流程)

⚠️ NEVER use
curl
for OAuth token validation.
Domains containing
--
(e.g.,
my-org--devint.sandbox.my.salesforce.com
) cause shell expansion failures with curl's
--
argument parsing. Use
credential_manager.py validate
instead.
⚠️ 绝不要使用
curl
进行OAuth令牌验证
。包含
--
的域名(例如
my-org--devint.sandbox.my.salesforce.com
)会导致curl的
--
参数解析出现shell扩展失败。请改用
credential_manager.py validate

A1: ECA Credential Setup

A1:ECA凭证设置

Why ECA? Multi-turn API testing uses the Agent Runtime API (
/einstein/ai-agent/v1
), which requires OAuth Client Credentials. If you only need interactive testing, use
sf agent preview
instead — no ECA needed, just
sf org login web
(v2.121.7+). See connected-app-setup.md.
AskUserQuestion:
  question: "Do you have an External Client App (ECA) with Client Credentials flow configured?"
  header: "ECA Setup"
  options:
    - label: "Yes, I have credentials"
      description: "I have Consumer Key, Secret, and My Domain URL ready"
    - label: "No, I need to create one"
      description: "Delegate to sf-connected-apps skill to create ECA"
If YES: Collect credentials (kept in conversation context only, NEVER written to files):
  • Consumer Key
  • Consumer Secret
  • My Domain URL (e.g.,
    your-domain.my.salesforce.com
    )
If NO: Delegate to sf-connected-apps:
Skill(skill="sf-connected-apps", args="Create External Client App with Client Credentials flow for Agent Runtime API testing. Scopes: api, chatbot_api, sfap_api, refresh_token, offline_access. Name: Agent_API_Testing")
Verify credentials work:
bash
undefined
为什么需要ECA? 多轮API测试使用Agent Runtime API(
/einstein/ai-agent/v1
),需要OAuth客户端凭证。如果只需要交互式测试,请改用
sf agent preview
— 不需要ECA,只需
sf org login web
(v2.121.7+)。请参见connected-app-setup.md
AskUserQuestion:
  question: "您是否配置了带有客户端凭证流的外部客户端应用(ECA)?"
  header: "ECA设置"
  options:
    - label: "是,我有凭证"
      description: "我已准备好消费者密钥、密钥和我的域名URL"
    - label: "否,我需要创建一个"
      description: "委托给sf-connected-apps技能创建ECA"
如果是: 收集凭证(仅保存在对话上下文中,绝不写入文件):
  • 消费者密钥
  • 消费者密钥
  • 我的域名URL(例如
    your-domain.my.salesforce.com
如果否: 委托给sf-connected-apps:
Skill(skill="sf-connected-apps", args="Create External Client App with Client Credentials flow for Agent Runtime API testing. Scopes: api, chatbot_api, sfap_api, refresh_token, offline_access. Name: Agent_API_Testing")
验证凭证是否有效:
bash
undefined

Validate OAuth credentials via credential_manager.py (handles token request internally)

通过credential_manager.py验证OAuth凭证(内部处理令牌请求)

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py
validate --org-alias {org} --eca-name {eca}

See [ECA Setup Guide](docs/eca-setup-guide.md) for complete instructions.
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py
validate --org-alias {org} --eca-name {eca}

完整说明请参见[ECA设置指南](docs/eca-setup-guide.md)。

A2: Agent Discovery & Metadata Retrieval

A2:Agent发现与元数据检索

bash
undefined
bash
undefined

Get agent ID for API calls

获取API调用使用的Agent ID

AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE DeveloperName='[AgentName]' AND IsActive=true LIMIT 1"
--result-format json --target-org [alias] | jq -r '.result.records[0].Id')
AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE DeveloperName='[AgentName]' AND IsActive=true LIMIT 1"
--result-format json --target-org [别名] | jq -r '.result.records[0].Id')

Retrieve full agent configuration

检索完整的Agent配置

sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentName]"
--output-dir retrieve-temp --target-org [alias]

Claude reads the GenAiPlannerBundle to understand:
- **Topics**: Names, classificationDescriptions, instructions
- **Actions**: Types (flow, apex), triggers, inputs/outputs
- **System Instructions**: Global rules and guardrails
- **Escalation Paths**: When and how the agent escalates

This metadata drives automatic test scenario generation in A3.
sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentName]"
--output-dir retrieve-temp --target-org [别名]

Claude读取GenAiPlannerBundle以了解:
- **主题**:名称、classificationDescriptions、指令
- **动作**:类型(flow、apex)、触发器、输入/输出
- **系统指令**:全局规则和防护规则
- **升级路径**:何时以及如何升级

此元数据驱动A3中的自动测试场景生成。

A3: Test Scenario Planning

A3:测试场景规划

AskUserQuestion:
  question: "What testing do you need?"
  header: "Scenarios"
  options:
    - label: "Comprehensive coverage (Recommended)"
      description: "All 6 test patterns: topic routing, context preservation, escalation, guardrails, action chaining, variable injection"
    - label: "Topic routing accuracy"
      description: "Test that utterances route to correct topics, including mid-conversation topic switches"
    - label: "Context preservation"
      description: "Test that the agent retains information across turns"
    - label: "Specific bug reproduction"
      description: "Reproduce a known issue with targeted multi-turn scenario"
  multiSelect: true
Claude uses the agent metadata from A2 to auto-generate multi-turn scenarios tailored to the specific agent:
  • Generates topic switching scenarios based on actual topic names
  • Creates context preservation tests using actual action inputs/outputs
  • Builds escalation tests based on actual escalation configuration
  • Creates guardrail tests based on system instructions
Available templates (see templates/):
TemplatePatternScenarios
multi-turn-topic-routing.yaml
Topic switching4
multi-turn-context-preservation.yaml
Context retention4
multi-turn-escalation-flows.yaml
Escalation cascades4
multi-turn-comprehensive.yaml
All 6 patterns6
AskUserQuestion:
  question: "您需要哪种测试?"
  header: "场景"
  options:
    - label: "全面覆盖(推荐)"
      description: "所有6种测试模式:主题路由、上下文保留、升级、防护规则、动作链、变量注入"
    - label: "主题路由准确性"
      description: "测试语句是否路由到正确的主题,包括对话中途的主题切换"
    - label: "上下文保留"
      description: "测试Agent在多轮对话中保留信息的能力"
    - label: "特定错误重现"
      description: "使用针对性的多轮场景重现已知问题"
  multiSelect: true
Claude使用A2中的Agent元数据自动生成针对特定Agent的多轮场景
  • 根据实际主题名称生成主题切换场景
  • 使用实际动作输入/输出创建上下文保留测试
  • 根据实际升级配置构建升级测试
  • 根据系统指令创建防护规则测试
可用模板(请参见模板):
模板模式场景数量
multi-turn-topic-routing.yaml
主题切换4
multi-turn-context-preservation.yaml
上下文保留4
multi-turn-escalation-flows.yaml
升级流程4
multi-turn-comprehensive.yaml
所有6种模式6

A4: Multi-Turn Execution

A4:多轮执行

Execute conversations via Agent Runtime API using the reusable Python scripts in
hooks/scripts/
.
⚠️ Agent API is NOT supported for agents of type "Agentforce (Default)". Only custom agents created via Agentforce Builder are supported.
Option 1: Run Test Scenarios from YAML Templates (Recommended)
Use the multi-turn test runner to execute entire scenario suites:
bash
undefined
通过Agent Runtime API使用
hooks/scripts/
中的可复用Python脚本执行对话。
⚠️ Agent API不支持"Agentforce(默认)"类型的Agent。仅支持通过Agentforce Builder创建的自定义Agent。
选项1:从YAML模板运行测试场景(推荐)
使用多轮测试运行器执行整个场景套件:
bash
undefined

Run comprehensive test suite against an agent

对Agent运行全面测试套件

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--verbose

Run specific scenario within a suite

运行套件中的特定场景

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-topic-routing.yaml
--scenario-filter topic_switch_natural
--verbose
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-topic-routing.yaml
--scenario-filter topic_switch_natural
--verbose

With context variables and JSON output for fix loop

带上下文变量和JSON输出用于修复循环

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--var '$Context.EndUserLanguage=en_US'
--output results.json
--verbose

**Exit codes:** `0` = all passed, `1` = some failed (fix loop should process), `2` = execution error

**Option 2: Use Environment Variables (cleaner for repeated runs)**

```bash
export SF_MY_DOMAIN="your-domain.my.salesforce.com"
export SF_CONSUMER_KEY="your_key"
export SF_CONSUMER_SECRET="your_secret"
export SF_AGENT_ID="0XxRM0000004ABC"
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--var '$Context.EndUserLanguage=en_US'
--output results.json
--verbose

**退出代码:** `0` = 全部通过, `1` = 部分失败(修复循环应处理), `2` = 执行错误

**选项2:使用环境变量(重复运行更简洁)**

```bash
export SF_MY_DOMAIN="your-domain.my.salesforce.com"
export SF_CONSUMER_KEY="your_key"
export SF_CONSUMER_SECRET="your_secret"
export SF_AGENT_ID="0XxRM0000004ABC"

Now run without credential flags

现在运行时不需要凭证标志

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--scenarios templates/multi-turn-comprehensive.yaml
--verbose

**Option 3: Python API for Ad-Hoc Testing**

For custom scenarios or debugging, use the client directly:

```python
from hooks.scripts.agent_api_client import AgentAPIClient

client = AgentAPIClient(
    my_domain="your-domain.my.salesforce.com",
    consumer_key="...",
    consumer_secret="..."
)
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--scenarios templates/multi-turn-comprehensive.yaml
--verbose

**选项3:Python API用于临时测试**

对于自定义场景或调试,直接使用客户端:

```python
from hooks.scripts.agent_api_client import AgentAPIClient

client = AgentAPIClient(
    my_domain="your-domain.my.salesforce.com",
    consumer_key="...",
    consumer_secret="..."
)

Context manager auto-ends session

上下文管理器自动结束会话

with client.session(agent_id="0XxRM000...") as session: r1 = session.send("I need to cancel my appointment") print(f"Turn 1: {r1.agent_text}")
r2 = session.send("Actually, reschedule instead")
print(f"Turn 2: {r2.agent_text}")

r3 = session.send("What was my original request?")
print(f"Turn 3: {r3.agent_text}")
# Check context preservation
if "cancel" in r3.agent_text.lower():
    print("✅ Context preserved")
with client.session(agent_id="0XxRM000...") as session: r1 = session.send("我需要取消我的预约") print(f"第1轮:{r1.agent_text}")
r2 = session.send("实际上,改为重新安排")
print(f"第2轮:{r2.agent_text}")

r3 = session.send("我最初的请求是什么?")
print(f"第3轮:{r3.agent_text}")
# 检查上下文保留
if "取消" in r3.agent_text.lower():
    print("✅ 上下文已保留")

With initial variables

带初始变量

variables = [ {"name": "$Context.AccountId", "type": "Id", "value": "001XXXXXXXXXXXX"}, {"name": "$Context.EndUserLanguage", "type": "Text", "value": "en_US"}, ] with client.session(agent_id="0Xx...", variables=variables) as session: r1 = session.send("What orders do I have?")

**Connectivity Test:**
```bash
variables = [ {"name": "$Context.AccountId", "type": "Id", "value": "001XXXXXXXXXXXX"}, {"name": "$Context.EndUserLanguage", "type": "Text", "value": "en_US"}, ] with client.session(agent_id="0Xx...", variables=variables) as session: r1 = session.send("我有哪些订单?")

**连通性测试:**
```bash

Verify ECA credentials and API connectivity

验证ECA凭证和API连通性

python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py
python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py

Reads SF_MY_DOMAIN, SF_CONSUMER_KEY, SF_CONSUMER_SECRET from env

从环境变量读取SF_MY_DOMAIN、SF_CONSUMER_KEY、SF_CONSUMER_SECRET


**Per-Turn Analysis Checklist:**

The test runner automatically evaluates each turn against expectations defined in the YAML template:

| # | Check | YAML Key | How Evaluated |
|---|-------|----------|---------------|
| 1 | Response non-empty? | `response_not_empty: true` | `messages[0].message` has content |
| 2 | Correct topic matched? | `topic_contains: "cancel"` | Heuristic: inferred from response text |
| 3 | Expected actions invoked? | `action_invoked: true` | Checks for `result` array entries |
| 4 | Response content? | `response_contains: "reschedule"` | Substring match on response |
| 5 | Context preserved? | `context_retained: true` | Heuristic: checks for prior-turn references |
| 6 | Guardrail respected? | `guardrail_triggered: true` | Regex patterns for refusal language |
| 7 | Escalation triggered? | `escalation_triggered: true` | Checks for `Escalation` message type |
| 8 | Response excludes? | `response_not_contains: "error"` | Substring exclusion check |

See [Agent API Reference](docs/agent-api-reference.md) for complete response format.

**每轮分析检查清单:**

测试运行器会根据YAML模板中定义的期望自动评估每一轮:

| # | 检查项 | YAML键 | 评估方式 |
|---|-------|----------|---------------|
| 1 | 响应非空? | `response_not_empty: true` | `messages[0].message`包含内容 |
| 2 | 匹配正确的主题? | `topic_contains: "cancel"` | 启发式:从响应文本推断 |
| 3 | 调用了预期的动作? | `action_invoked: true` | 检查`result`数组条目 |
| 4 | 响应内容? | `response_contains: "reschedule"` | 响应中的子字符串匹配 |
| 5 | 上下文已保留? | `context_retained: true` | 启发式:检查对前一轮的引用 |
| 6 | 遵守防护规则? | `guardrail_triggered: true` | 拒绝语言的正则表达式模式 |
| 7 | 触发升级? | `escalation_triggered: true` | 检查`Escalation`消息类型 |
| 8 | 响应不包含? | `response_not_contains: "error"` | 子字符串排除检查 |

完整响应格式请参见[Agent API参考](docs/agent-api-reference.md)。

A5: Results & Scoring

A5:结果与评分

Claude generates a terminal-friendly results report:
📊 MULTI-TURN TEST RESULTS
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
Org: vivint-DevInt
Mode: Agent Runtime API (multi-turn)

SCENARIO RESULTS
───────────────────────────────────────────────────────────────
✅ topic_switch_natural        3/3 turns passed
✅ context_user_identity       3/3 turns passed
❌ escalation_frustration      2/3 turns passed (Turn 3: no escalation)
✅ guardrail_mid_conversation  3/3 turns passed
✅ action_chain_identify       3/3 turns passed
⚠️ variable_injection          2/3 turns passed (Turn 3: re-asked for account)

SUMMARY
───────────────────────────────────────────────────────────────
Scenarios: 6 total | 4 passed | 1 failed | 1 partial
Turns: 18 total | 16 passed | 2 failed
Topic Re-matching: 100% ✅
Context Preservation: 83% ⚠️
Escalation Accuracy: 67% ❌

FAILED TURNS
───────────────────────────────────────────────────────────────
❌ escalation_frustration → Turn 3
   Input: "Nothing is working! I need a human NOW"
   Expected: Escalation triggered
   Actual: Agent continued troubleshooting
   Category: MULTI_TURN_ESCALATION_FAILURE
   Fix: Add frustration keywords to escalation triggers

⚠️ variable_injection → Turn 3
   Input: "Create a new case for a billing issue"
   Expected: Uses pre-set $Context.AccountId
   Actual: "Which account is this for?"
   Category: CONTEXT_PRESERVATION_FAILURE
   Fix: Wire $Context.AccountId to CreateCase action input

SCORING
───────────────────────────────────────────────────────────────
Topic Selection Coverage          13/15
Action Invocation                 14/15
Multi-Turn Topic Re-matching      15/15  ✅
Context Preservation              10/15  ⚠️
Edge Case & Guardrail Coverage    12/15
Test Spec / Scenario Quality       9/10
Agentic Fix Success               --/15  (pending)

TOTAL: 73/85 (86%) + Fix Loop pending

Claude生成适合终端显示的结果报告:
📊 多轮测试结果
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
组织: vivint-DevInt
模式: Agent Runtime API(多轮)

场景结果
───────────────────────────────────────────────────────────────
✅ topic_switch_natural        3/3轮通过
✅ context_user_identity       3/3轮通过
❌ escalation_frustration      2/3轮通过(第3轮:未升级)
✅ guardrail_mid_conversation  3/3轮通过
✅ action_chain_identify       3/3轮通过
⚠️ variable_injection          2/3轮通过(第3轮:重新询问账户)

摘要
───────────────────────────────────────────────────────────────
场景: 共6个 | 通过4个 | 失败1个 | 部分通过1个
轮次: 共18轮 | 通过16轮 | 失败2轮
主题重匹配: 100% ✅
上下文保留: 83% ⚠️
升级准确性: 67% ❌

失败轮次
───────────────────────────────────────────────────────────────
❌ escalation_frustration → 第3轮
   输入: "什么都不管用!我现在需要人工服务!"
   预期: 触发升级
   实际: Agent继续故障排除
   类别: MULTI_TURN_ESCALATION_FAILURE
   修复: 向升级触发器添加沮丧关键词

⚠️ variable_injection → 第3轮
   输入: "为账单问题创建新案例"
   预期: 使用预设的$Context.AccountId
   实际: "这是哪个账户的问题?"
   类别: CONTEXT_PRESERVATION_FAILURE
   修复: 将$Context.AccountId连接到CreateCase动作输入

评分
───────────────────────────────────────────────────────────────
主题选择覆盖率          13/15
动作调用                 14/15
多轮主题重匹配      15/15  ✅
上下文保留              10/15  ⚠️
边缘案例与防护规则覆盖率    12/15
测试用例/场景质量       9/10
Agent自动修复成功率       --/15  (待处理)

总计: 73/85 (86%) + 修复循环待处理

Phase B: CLI Testing Center (SECONDARY)

阶段B:CLI测试中心(副流程)

Availability: Requires Agent Testing Center feature enabled in org. If unavailable, use Phase A exclusively.
可用性: 需要组织中启用Agent测试中心功能。 如果不可用,请仅使用阶段A。

⚡ Agent Script Agents (AiAuthoringBundle)

⚡ Agent Script Agent(AiAuthoringBundle)

Agent Script agents (
.agent
files in
aiAuthoringBundles/
) deploy as
BotDefinition
and use the same
sf agent test
CLI commands. However, they have unique testing challenges:
Two-Level Action System:
  • Level 1 (Definition):
    topic.actions:
    block — defines actions with
    target: "apex://ClassName"
  • Level 2 (Invocation):
    reasoning.actions:
    block — invokes via
    @actions.<name>
    with variable bindings
Single-Utterance Limitation: Multi-topic Agent Script agents with
start_agent
routing have a "1 action per reasoning cycle" budget in CLI tests. The first cycle is consumed by the transition action (
go_<topic>
). The actual business action (e.g.,
get_order_status
) fires in a second cycle that single-utterance tests don't reach.
Solution — Use
conversationHistory
:
yaml
testCases:
  # ROUTING TEST — captures transition action only
  - utterance: "I want to check my order status"
    expectedTopic: order_status
    expectedActions:
      - go_order_status          # Transition action from start_agent

  # ACTION TEST — use conversationHistory to skip routing
  - utterance: "The order ID is 801ak00001g59JlAAI"
    conversationHistory:
      - role: "user"
        message: "I want to check my order status"
      - role: "agent"
        topic: "order_status"    # Pre-positions agent in target topic
        message: "I'd be happy to help! Could you provide the Order ID?"
    expectedTopic: order_status
    expectedActions:
      - get_order_status         # Level 1 DEFINITION name (NOT invocation name)
    expectedOutcome: "Agent retrieves and displays order details"
Key Rules for Agent Script CLI Tests:
  • expectedActions
    uses the Level 1 definition name (e.g.,
    get_order_status
    ), NOT the Level 2 invocation name (e.g.,
    check_status
    )
  • Agent Script topic names may differ in org — use the topic name discovery workflow
  • Agents with
    WITH USER_MODE
    Apex require the Einstein Agent User to have object permissions — missing permissions cause silent failures (0 rows, no error)
  • subjectName
    in the YAML spec maps to
    config.developer_name
    in the
    .agent
    file
⚠️ Agent Script API Testing Caveat:
Agent Script agents embed action results differently via the Agent Runtime API:
  • Agent Builder agents: Return separate
    ActionResult
    message types with structured data
  • Agent Script agents: Embed action outputs within
    Inform
    text messages — no separate
    ActionResult
    type
This means:
  • action_invoked: true
    (boolean) may fail even when the action runs — use
    response_contains
    to verify action output instead
  • action_invoked: "action_name"
    uses
    plannerSurfaces
    fallback parsing but is less reliable
  • For robust testing, prefer
    response_contains
    /
    response_contains_any
    checks over
    action_invoked
Agent Script Templates & Docs:
  • Template: agentscript-test-spec.yaml — 5 test patterns (CLI)
  • Template: multi-turn-agentscript-comprehensive.yaml — 6 multi-turn API scenarios
  • Guide: agentscript-testing-patterns.md — detailed patterns with worked examples
Automated Test Spec Generation:
bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml --verbose
Agent Script Agent(
aiAuthoringBundles/
中的
.agent
文件)部署为
BotDefinition
,并使用相同的
sf agent test
CLI命令。但是,它们有独特的测试挑战:
两级动作系统:
  • 第1级(定义):
    topic.actions:
    块 — 定义带有
    target: "apex://ClassName"
    的动作
  • 第2级(调用):
    reasoning.actions:
    块 — 通过
    @actions.<name>
    调用并绑定变量
单轮语句限制: 带有
start_agent
路由的多主题Agent Script Agent在CLI测试中每个推理周期有"1个动作"的预算。第一个周期被过渡动作
go_<topic>
)消耗。实际业务动作(例如
get_order_status
)在单轮测试无法到达的第二个周期触发。
解决方案 — 使用
conversationHistory
yaml
testCases:
  # 路由测试 — 仅捕获过渡动作
  - utterance: "我想查看我的订单状态"
    expectedTopic: order_status
    expectedActions:
      - go_order_status          # 来自start_agent的过渡动作

  # 动作测试 — 使用conversationHistory跳过路由
  - utterance: "订单ID是801ak00001g59JlAAI"
    conversationHistory:
      - role: "user"
        message: "我想查看我的订单状态"
      - role: "agent"
        topic: "order_status"    # 将Agent预先定位到目标主题
        message: "我很乐意为您提供帮助!请提供订单ID?"
    expectedTopic: order_status
    expectedActions:
      - get_order_status         # 第1级定义名称(不是调用名称)
    expectedOutcome: "Agent检索并显示订单详情"
Agent Script CLI测试关键规则:
  • expectedActions
    使用第1级定义名称(例如
    get_order_status
    ),而不是第2级调用名称(例如
    check_status
  • Agent Script主题名称在组织中可能不同 — 使用主题名称发现工作流
  • 带有
    WITH USER_MODE
    Apex的Agent需要Einstein Agent User具有对象权限 — 缺少权限会导致静默失败(0行,无错误)
  • YAML规范中的
    subjectName
    映射到
    .agent
    文件中的
    config.developer_name
⚠️ Agent Script API测试注意事项:
Agent Script Agent通过Agent Runtime API嵌入动作结果的方式不同:
  • Agent Builder Agent:返回带有结构化数据的独立
    ActionResult
    消息类型
  • Agent Script Agent:在
    Inform
    文本消息中嵌入动作输出 — 没有独立的
    ActionResult
    类型
这意味着:
  • action_invoked: true
    (布尔值)即使动作运行也可能失败 — 改用
    response_contains
    验证动作输出
  • action_invoked: "action_name"
    使用
    plannerSurfaces
    回退解析,但可靠性较低
  • 为了稳健测试,优先使用
    response_contains
    /
    response_contains_any
    检查而不是
    action_invoked
Agent Script模板与文档:
  • 模板:agentscript-test-spec.yaml — 5种测试模式(CLI)
  • 模板:multi-turn-agentscript-comprehensive.yaml — 6种多轮API场景
  • 指南:agentscript-testing-patterns.md — 带实际示例的详细模式
自动测试规范生成:
bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml --verbose

Generates both routing tests (with transition actions) and

生成路由测试(带过渡动作)和

action tests (with conversationHistory for apex:// targets)

动作测试(带针对apex://目标的conversationHistory)


**Agent Discovery:**
```bash

**Agent发现:**
```bash

Discover Agent Script agents alongside XML-based agents

与基于XML的Agent一起发现Agent Script Agent

python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local
--project-dir /path/to/project --agent-name MyAgent
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local
--project-dir /path/to/project --agent-name MyAgent

Returns type: "AiAuthoringBundle" for .agent files

对于.agent文件返回type: "AiAuthoringBundle"

undefined
undefined

B1: Test Spec Creation

B1:测试规范创建

⚠️ CRITICAL: YAML Schema
The CLI YAML spec uses a FLAT structure parsed by
@salesforce/agents
— NOT the fabricated
apiVersion
/
kind
/
metadata
format. See test-spec-guide.md for the correct schema.
Required top-level fields:
  • name:
    — Display name (MasterLabel). Deploy FAILS without this.
  • subjectType: AGENT
  • subjectName:
    — Agent BotDefinition DeveloperName
Test case fields (flat, NOT nested):
  • utterance:
    — User message
  • expectedTopic:
    — NOT
    expectation.topic
  • expectedActions:
    — Flat list of strings, NOT objects with
    name
    /
    invoked
    /
    outputs
  • expectedOutcome:
    — Optional natural language description
yaml
undefined
⚠️ 关键:YAML架构
CLI YAML规范使用扁平结构,由
@salesforce/agents
解析 — 不是虚构的
apiVersion
/
kind
/
metadata
格式。 正确架构请参见test-spec-guide.md
必填顶级字段:
  • name:
    — 显示名称(MasterLabel)。没有此字段部署会失败。
  • subjectType: AGENT
  • subjectName:
    — Agent BotDefinition DeveloperName
测试用例字段(扁平,非嵌套):
  • utterance:
    — 用户消息
  • expectedTopic:
    — 不是
    expectation.topic
  • expectedActions:
    — 扁平字符串列表,不是带有
    name
    /
    invoked
    /
    outputs
    的对象
  • expectedOutcome:
    — 可选自然语言描述
yaml
undefined

✅ Correct CLI YAML format

✅ 正确的CLI YAML格式

name: "My Agent Tests" subjectType: AGENT subjectName: My_Agent
testCases:
  • utterance: "Where is my order?" expectedTopic: order_lookup expectedActions:
    • get_order_status expectedOutcome: "Agent should provide order status information"

**Option A: Interactive Generation** (no automation)
```bash
name: "我的Agent测试" subjectType: AGENT subjectName: My_Agent
testCases:
  • utterance: "我的订单在哪里?" expectedTopic: order_lookup expectedActions:
    • get_order_status expectedOutcome: "Agent应提供订单状态信息"

**选项A:交互式生成**(无自动化)
```bash

Interactive test spec generation

交互式测试规范生成

sf agent generate test-spec --output-file ./tests/agent-spec.yaml
sf agent generate test-spec --output-file ./tests/agent-spec.yaml

⚠️ NOTE: No --api-name flag! Interactive-only.

⚠️ 注意:没有--api-name标志!仅支持交互式。


**Option B: Automated Generation** (Python script)
```bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml \
  --verbose
Create Test in Org:
bash
sf agent test create --spec ./tests/agent-spec.yaml --api-name MyAgentTest --target-org [alias]
See Test Spec Reference for complete YAML format guide.

**选项B:自动生成**(Python脚本)
```bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml \
  --verbose
在组织中创建测试:
bash
sf agent test create --spec ./tests/agent-spec.yaml --api-name MyAgentTest --target-org [别名]
完整YAML格式指南请参见测试规范参考

B1.5: Topic Name Resolution

B1.5:主题名称解析

Topic name format in
expectedTopic
depends on the topic type:
Topic TypeYAML ValueResolution
Standard (Escalation, Off_Topic)
localDeveloperName
(e.g.,
Escalation
)
Framework resolves automatically
Promoted (p_16j... prefix)Full runtime
developerName
with hash
Must be exact match
Standard topics like
Escalation
can use the short name — the CLI framework resolves to the hash-suffixed runtime name.
Promoted topics (custom topics created in Setup UI) MUST use the full runtime
developerName
including hash suffix. The short
localDeveloperName
does NOT resolve.
Discovery workflow:
  1. Write spec with best guesses for topic names
  2. Deploy and run:
    sf agent test run --api-name X --wait 10 --result-format json --json
  3. Extract actual names:
    jq '.result.testCases[].generatedData.topic'
  4. Update spec with actual runtime names
  5. Re-deploy with
    --force-overwrite
    and re-run
See topic-name-resolution.md for the complete guide.
expectedTopic
中的主题名称格式取决于主题类型:
主题类型YAML值解析方式
标准(Escalation、Off_Topic)
localDeveloperName
(例如
Escalation
框架自动解析
推广(p_16j...前缀)带哈希的完整运行时
developerName
必须完全匹配
标准主题
Escalation
可以使用短名称 — CLI框架会解析为带哈希后缀的运行时名称。
推广主题(在设置UI中创建的自定义主题)必须使用包含哈希后缀的完整运行时
developerName
。短
localDeveloperName
无法解析。
发现工作流:
  1. 使用主题名称的最佳猜测编写规范
  2. 部署并运行:
    sf agent test run --api-name X --wait 10 --result-format json --json
  3. 提取实际名称:
    jq '.result.testCases[].generatedData.topic'
  4. 使用实际运行时名称更新规范
  5. 使用
    --force-overwrite
    重新部署并重新运行
完整指南请参见topic-name-resolution.md

B1.6: Known CLI Gotchas

B1.6:已知CLI陷阱

GotchaDetail
name:
mandatory
Deploy fails: "Required fields are missing: [MasterLabel]"
expectedActions
is flat strings
- action_name
NOT
- name: action_name, invoked: true
Empty
expectedActions: []
Means "not testing" — PASS even when actions invoked
Missing
expectedOutcome
output_validation
reports ERROR — harmless
No MessagingSession contextFlows needing
recordId
error (agent handles gracefully)
--use-most-recent
broken
Always use
--job-id
for
sf agent test results
contextVariables
name
prefix
Use
RoutableId
NOT
$Context.RoutableId
— framework adds prefix
customEvaluations RETRY bug⚠️ Spring '26: Server returns RETRY → REST API 500. See Known Issues.
conciseness
metric broken
Returns score=0, empty explanation — platform bug
instruction_following
threshold
Labels FAILURE even at score=1 — use score value, ignore label
陷阱详情
name:
必填
部署失败:"Required fields are missing: [MasterLabel]"
expectedActions
是扁平字符串
- action_name
不是
- name: action_name, invoked: true
expectedActions: []
表示"不测试" — 即使调用动作也会通过
缺少
expectedOutcome
output_validation
报告ERROR — 无害
无MessagingSession上下文需要
recordId
的Flows会出错(Agent会优雅处理)
--use-most-recent
已损坏
始终使用
--job-id
获取
sf agent test results
contextVariables
name
前缀
使用
RoutableId
而不是
$Context.RoutableId
— 框架会添加前缀
customEvaluations RETRY错误⚠️ Spring '26: 服务器返回RETRY → REST API 500。请参见已知问题
conciseness
指标已损坏
返回score=0,空explanation — 平台错误
instruction_following
阈值
即使score=1也标记为FAILURE — 使用分数值,忽略标签

B1.7: Context Variables

B1.7:上下文变量

Context variables inject session-level data (record IDs, user info) into CLI test cases. Without them, action flows receive the topic's internal name as
recordId
. With them, they receive a real record ID.
When to use: Any test case where action flows need real record IDs (e.g., updating a MessagingSession, creating a Case).
YAML syntax:
yaml
contextVariables:
  - name: RoutableId            # Bare name — NOT $Context.RoutableId
    value: "0Mwbb000007MGoTCAW"
  - name: CaseId
    value: "500XX0000000001"
Key rules:
  • name
    uses bare variable name (e.g.,
    RoutableId
    ), NOT
    $Context.RoutableId
    — the CLI adds the prefix
  • Maps to
    <contextVariable><variableName>
    /
    <variableValue>
    in XML metadata
Discovery — find valid IDs:
bash
sf data query --query "SELECT Id FROM MessagingSession WHERE Status='Active' LIMIT 1" --target-org [alias]
sf data query --query "SELECT Id FROM Case ORDER BY CreatedDate DESC LIMIT 1" --target-org [alias]
Verified effect (IRIS testing, 2026-02-09):
  • Without
    RoutableId
    : action receives
    recordId: "p_16jPl000000GwEX_Field_Support_Routing_16j8eeef13560aa"
    (topic name)
  • With
    RoutableId
    : action receives
    recordId: "0Mwbb000007MGoTCAW"
    (real MessagingSession ID)
Note: Context variables do NOT unlock authentication-gated topics. Injecting
RoutableId
+
CaseId
does not satisfy
User_Authentication
flows.
See context-vars-test-spec.yaml for a dedicated template.
上下文变量将会话级数据(记录ID、用户信息)注入到CLI测试用例中。没有这些变量,动作流会将主题的内部名称作为
recordId
。有了这些变量,它们会收到真实的记录ID。
使用场景: 任何动作流需要真实记录ID的测试用例(例如更新MessagingSession、创建案例)。
YAML语法:
yaml
contextVariables:
  - name: RoutableId            # 裸名称 — 不是$Context.RoutableId
    value: "0Mwbb000007MGoTCAW"
  - name: CaseId
    value: "500XX0000000001"
关键规则:
  • name
    使用裸变量名称(例如
    RoutableId
    ),而不是
    $Context.RoutableId
    — CLI会添加前缀
  • 映射到XML元数据中的
    <contextVariable><variableName>
    /
    <variableValue>
发现 — 查找有效ID:
bash
sf data query --query "SELECT Id FROM MessagingSession WHERE Status='Active' LIMIT 1" --target-org [别名]
sf data query --query "SELECT Id FROM Case ORDER BY CreatedDate DESC LIMIT 1" --target-org [别名]
已验证效果(IRIS测试,2026-02-09):
  • 没有
    RoutableId
    :动作收到
    recordId: "p_16jPl000000GwEX_Field_Support_Routing_16j8eeef13560aa"
    (主题名称)
  • RoutableId
    :动作收到
    recordId: "0Mwbb000007MGoTCAW"
    (真实MessagingSession ID)
注意: 上下文变量不会解锁认证 gated 主题。注入
RoutableId
+
CaseId
不满足
User_Authentication
流程。
专用模板请参见context-vars-test-spec.yaml

B1.8: Metrics

B1.8:指标

Metrics add platform quality scoring to test cases. Specify as a flat list of metric names in the YAML.
YAML syntax:
yaml
metrics:
  - coherence
  - instruction_following
  - output_latency_milliseconds
Available metrics (observed behavior from IRIS testing, 2026-02-09):
MetricScore RangeStatusNotes
coherence
1-5✅ WorksScores 4-5 for clear responses. Recommended.
completeness
1-5⚠️ MisleadingPenalizes triage/routing agents for "not solving" — skip for routing agents.
conciseness
1-5🔴 BrokenReturns score=0, empty explanation. Platform bug.
instruction_following
0-1⚠️ Threshold bugLabels "FAILURE" at score=1 when explanation says "follows perfectly."
output_latency_milliseconds
Raw ms✅ WorksNo pass/fail — useful for performance baselining.
Recommendation: Use
coherence
+
output_latency_milliseconds
for baseline quality. Skip
conciseness
(broken) and
completeness
(misleading for routing agents).
指标为测试用例添加平台质量评分。在YAML中指定为扁平的指标名称列表。
YAML语法:
yaml
metrics:
  - coherence
  - instruction_following
  - output_latency_milliseconds
可用指标(IRIS测试观察到的行为,2026-02-09):
指标评分范围状态说明
coherence
1-5✅ 可用清晰响应评4-5分,推荐使用
completeness
1-5⚠️ 有误导性因"未解决问题"而惩罚分诊/路由Agent — 路由Agent跳过此指标
conciseness
1-5🔴 已损坏始终返回score=0,空explanation。平台错误。
instruction_following
0-1⚠️ 阈值错误即使score=1且说明文本说Agent"完全遵循指令"也标记为"FAILURE"。
output_latency_milliseconds
原始毫秒✅ 可用无通过/失败 — 用于性能基准测试
推荐: 使用
coherence
+
output_latency_milliseconds
作为基准质量。跳过
conciseness
(已损坏)和
completeness
(对路由Agent有误导性)。

B1.9: Custom Evaluations (⚠️ Spring '26 Bug)

B1.9:自定义评估(⚠️ Spring '26错误)

Custom evaluations allow JSONPath-based assertions on action inputs and outputs — e.g., "verify the action received
supportPath = 'Field Support'
."
YAML syntax:
yaml
customEvaluations:
  - label: "supportPath is Field Support"
    name: string_comparison
    parameters:
      - name: operator
        value: equals
        isReference: false
      - name: actual
        value: "$.generatedData.invokedActions[0][0].function.input.supportPath"
        isReference: true       # JSONPath resolved against generatedData
      - name: expected
        value: "Field Support"
        isReference: false
Evaluation types:
  • string_comparison
    :
    equals
    ,
    contains
    ,
    startswith
    ,
    endswith
  • numeric_comparison
    :
    equals
    ,
    greater_than
    ,
    less_than
    ,
    greater_than_or_equal
    ,
    less_than_or_equal
Building JSONPath expressions:
  1. Run tests with
    --verbose
    to see
    generatedData.invokedActions
  2. Parse the stringified JSON (it's
    "[[{...}]]"
    , not a parsed array)
  3. Common paths:
    $.generatedData.invokedActions[0][0].function.input.[field]
⚠️ BLOCKED — Spring '26 Platform Bug: Custom evaluations with
isReference: true
cause the server to return "RETRY" status. The results API crashes with
INTERNAL_SERVER_ERROR
. This is server-side (confirmed via direct
curl
). Workaround: Use
expectedOutcome
(LLM-as-judge) or the Testing Center UI until patched.
See custom-eval-test-spec.yaml for a dedicated template.
自定义评估允许对动作输入和输出进行基于JSONPath的断言 — 例如"验证动作收到
supportPath = 'Field Support'
"。
YAML语法:
yaml
customEvaluations:
  - label: "supportPath为Field Support"
    name: string_comparison
    parameters:
      - name: operator
        value: equals
        isReference: false
      - name: actual
        value: "$.generatedData.invokedActions[0][0].function.input.supportPath"
        isReference: true       # 针对generatedData解析JSONPath
      - name: expected
        value: "Field Support"
        isReference: false
评估类型:
  • string_comparison
    :
    equals
    ,
    contains
    ,
    startswith
    ,
    endswith
  • numeric_comparison
    :
    equals
    ,
    greater_than
    ,
    less_than
    ,
    greater_than_or_equal
    ,
    less_than_or_equal
构建JSONPath表达式:
  1. 使用
    --verbose
    运行测试以查看
    generatedData.invokedActions
  2. 解析字符串化的JSON(是
    "[[{...}]]"
    ,不是解析后的数组)
  3. 常见路径:
    $.generatedData.invokedActions[0][0].function.input.[field]
⚠️ 已阻止 — Spring '26平台错误: 带有
isReference: true
的自定义评估导致服务器返回"RETRY"状态。结果API崩溃并显示
INTERNAL_SERVER_ERROR
。这是服务器端问题(通过直接
curl
确认)。解决方法: 在修复前使用
expectedOutcome
(LLM作为判断者)或测试中心UI。
专用模板请参见custom-eval-test-spec.yaml

B2: Test Execution

B2:测试执行

bash
undefined
bash
undefined

Run automated tests

运行自动化测试

sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org [alias]

> **No ECA required.** Preview uses standard org auth (`sf org login web`). No Connected App setup needed (v2.121.7+).

**Interactive Preview (Simulated):**
```bash
sf agent preview --api-name AgentName --output-dir ./logs --target-org [alias]
Interactive Preview (Live):
bash
sf agent preview --api-name AgentName --use-live-actions --apex-debug --target-org [alias]
sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org [别名]

> **不需要ECA**。预览使用标准组织认证(`sf org login web`)。不需要连接应用设置(v2.121.7+)。

**交互式预览(模拟):**
```bash
sf agent preview --api-name AgentName --output-dir ./logs --target-org [别名]
交互式预览(真实):
bash
sf agent preview --api-name AgentName --use-live-actions --apex-debug --target-org [别名]

B3: Results Analysis

B3:结果分析

Parse test results JSON and display formatted summary:
📊 AGENT TEST RESULTS (CLI)
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
Org: vivint-DevInt
Duration: 45.2s
Mode: Simulated

SUMMARY
───────────────────────────────────────────────────────────────
✅ Passed:    18
❌ Failed:    2
⏭️ Skipped:   0
📈 Topic Selection: 95%
🎯 Action Invocation: 90%

FAILED TESTS
───────────────────────────────────────────────────────────────
❌ test_complex_order_inquiry
   Utterance: "What's the status of orders 12345 and 67890?"
   Expected: get_order_status invoked 2 times
   Actual: get_order_status invoked 1 time
   Category: ACTION_INVOCATION_COUNT_MISMATCH

COVERAGE SUMMARY
───────────────────────────────────────────────────────────────
Topics Tested:       4/5 (80%) ⚠️
Actions Tested:      6/8 (75%) ⚠️
Guardrails Tested:   3/3 (100%) ✅

解析测试结果JSON并显示格式化摘要:
📊 AGENT测试结果(CLI)
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
组织: vivint-DevInt
持续时间: 45.2s
模式: 模拟

摘要
───────────────────────────────────────────────────────────────
✅ 通过:    18
❌ 失败:    2
⏭️ 跳过:   0
📈 主题选择: 95%
🎯 动作调用: 90%

失败测试
───────────────────────────────────────────────────────────────
❌ test_complex_order_inquiry
   语句: "订单12345和67890的状态是什么?"
   预期: get_order_status调用2次
   实际: get_order_status调用1次
   类别: ACTION_INVOCATION_COUNT_MISMATCH

覆盖率摘要
───────────────────────────────────────────────────────────────
已测试主题:       4/5 (80%) ⚠️
已测试动作:      6/8 (75%) ⚠️
已测试防护规则:   3/3 (100%) ✅

Phase C: Agentic Fix Loop

阶段C:Agent自动修复循环

When tests fail (either Phase A or Phase B), automatically fix via sf-ai-agentscript:
当测试失败时(阶段A或B),通过sf-ai-agentscript自动修复:

Failure Categories (10 total)

故障类别(共10种)

CategorySourceAuto-FixStrategy
TOPIC_NOT_MATCHED
A+BAdd keywords to topic description
ACTION_NOT_INVOKED
A+BImprove action description
WRONG_ACTION_SELECTED
A+BDifferentiate descriptions
ACTION_INVOCATION_FAILED
A+B⚠️Delegate to sf-flow or sf-apex
GUARDRAIL_NOT_TRIGGERED
A+BAdd explicit guardrails
ESCALATION_NOT_TRIGGERED
A+BAdd escalation action/triggers
TOPIC_RE_MATCHING_FAILURE
AAdd transition phrases to target topic
CONTEXT_PRESERVATION_FAILURE
AAdd context retention instructions
MULTI_TURN_ESCALATION_FAILURE
AAdd frustration detection triggers
ACTION_CHAIN_FAILURE
AFix action output variable mappings
类别来源自动修复策略
TOPIC_NOT_MATCHED
A+B向主题描述添加关键词
ACTION_NOT_INVOKED
A+B改进动作描述
WRONG_ACTION_SELECTED
A+B区分描述
ACTION_INVOCATION_FAILED
A+B⚠️委托给sf-flow或sf-apex
GUARDRAIL_NOT_TRIGGERED
A+B添加明确的防护规则
ESCALATION_NOT_TRIGGERED
A+B添加升级动作/触发器
TOPIC_RE_MATCHING_FAILURE
A向目标主题添加过渡短语
CONTEXT_PRESERVATION_FAILURE
A添加上下文保留指令
MULTI_TURN_ESCALATION_FAILURE
A添加沮丧检测触发器
ACTION_CHAIN_FAILURE
A修复动作输出变量映射

Auto-Fix Command Example

自动修复命令示例

bash
Skill(skill="sf-ai-agentscript", args="Fix agent [AgentName] - Error: [category] - [details]")
bash
Skill(skill="sf-ai-agentscript", args="修复Agent [AgentName] - 错误: [category] - [详情]")

Fix Loop Flow

修复循环流程

Test Failed → Analyze failure category
    ├─ Single-turn failure → Standard fix (topics, actions, guardrails)
    └─ Multi-turn failure → Enhanced fix (context, re-matching, escalation, chaining)
Apply fix via sf-ai-agentscript → Re-publish → Re-test
    ├─ Pass → ✅ Move to next failure
    └─ Fail → Retry (max 3 attempts) → Escalate to human
See Agentic Fix Loops Guide for complete decision tree and 10 fix strategies.
测试失败 → 分析故障类别
    ├─ 单轮失败 → 标准修复(主题、动作、防护规则)
    └─ 多轮失败 → 增强修复(上下文、重匹配、升级、链式调用)
通过sf-ai-agentscript应用修复 → 重新发布 → 重新测试
    ├─ 通过 → ✅ 处理下一个失败
    └─ 失败 → 重试(最多3次) → 升级给人工
完整决策树和10种修复策略请参见Agent自动修复循环指南

Two Fix Strategies

两种修复策略

Agent TypeFix StrategyWhen to Use
Custom Agent (you control it)Fix the agent via
sf-ai-agentscript
Topic descriptions, action configs need adjustment
Managed/Standard AgentFix test expectationsTest expectations don't match actual behavior

Agent类型修复策略使用场景
自定义Agent(您控制它)通过
sf-ai-agentscript
修复Agent
主题描述、动作配置需要调整
托管/标准Agent修复测试期望测试期望与实际行为不匹配

Phase D: Coverage Improvement

阶段D:覆盖率提升

If coverage < threshold:
  1. Identify untested topics/actions/patterns from results
  2. Add test cases (YAML for CLI, scenarios for API)
  3. Re-run tests
  4. Repeat until threshold met
如果覆盖率<阈值:
  1. 从结果中识别未测试的主题/动作/模式
  2. 添加测试用例(CLI使用YAML,API使用场景)
  3. 重新运行测试
  4. 重复直到达到阈值

Coverage Dimensions

覆盖率维度

DimensionPhase APhase BTarget
Topic Selection100%
Action Invocation100%
Topic Re-matching90%+
Context Preservation95%+
Conversation Completion85%+
Guardrails100%
Escalation100%
Phrasing Diversity3+ per topic
See Coverage Analysis for complete metrics and improvement guide.

维度阶段A阶段B目标
主题选择100%
动作调用100%
主题重匹配90%+
上下文保留95%+
对话完成85%+
防护规则100%
升级100%
措辞多样性每个主题3+种
完整指标和改进指南请参见覆盖率分析

Phase E: Observability Integration

阶段E:可观测性集成

After test execution, guide user to analyze agent behavior with session-level observability:
Skill(skill="sf-ai-agentforce-observability", args="Analyze STDM sessions for agent [AgentName] in org [alias] - focus on test session behavior patterns")
What observability adds to testing:
  • STDM Session Analysis: Examine actual session traces from test conversations
  • Latency Profiling: Identify slow actions or topic routing delays
  • Error Pattern Detection: Find recurring failures across sessions
  • Action Execution Traces: Detailed view of Flow/Apex execution during tests

测试执行后,引导用户使用会话级可观测性分析Agent行为:
Skill(skill="sf-ai-agentforce-observability", args="分析组织[别名]中Agent[AgentName]的STDM会话 — 关注测试会话行为模式")
可观测性为测试增添的价值:
  • STDM会话分析: 检查测试对话的实际会话跟踪
  • 延迟分析: 识别慢动作或主题路由延迟
  • 错误模式检测: 发现跨会话的重复失败
  • 动作执行跟踪: 测试期间Flow/Apex执行的详细视图

Scoring System (100 Points)

评分系统(100分)

CategoryPointsKey Rules
Topic Selection Coverage15All topics have test cases; various phrasings tested
Action Invocation15All actions tested with valid inputs/outputs
Multi-Turn Topic Re-matching15Topic switching accuracy across turns
Context Preservation15Information retention across turns
Edge Case & Guardrail Coverage15Negative tests; guardrails; escalation
Test Spec / Scenario Quality10Proper YAML; descriptions; clear expectations
Agentic Fix Success15Auto-fixes resolve issues within 3 attempts
Scoring Thresholds:
⭐⭐⭐⭐⭐ 90-100 pts → Production Ready
⭐⭐⭐⭐   80-89 pts → Good, minor improvements
⭐⭐⭐    70-79 pts → Acceptable, needs work
⭐⭐      60-69 pts → Below standard
⭐        <60 pts  → BLOCKED - Major issues

类别分数关键规则
主题选择覆盖率15所有主题都有测试用例;测试多种措辞
动作调用15所有动作都使用有效输入/输出测试
多轮主题重匹配15多轮对话中主题切换的准确性
上下文保留15多轮对话中信息的保留
边缘案例与防护规则覆盖率15负面测试;防护规则;升级
测试用例/场景质量10正确的YAML;描述;清晰的期望
Agent自动修复成功率15自动修复在3次尝试内解决问题
评分阈值:
⭐⭐⭐⭐⭐ 90-100分 → 可用于生产
⭐⭐⭐⭐   80-89分 → 良好,需小幅改进
⭐⭐⭐    70-79分 → 可接受,需要改进
⭐⭐      60-69分 → 低于标准
⭐        <60分  → 已阻止 - 存在重大问题

⛔ TESTING GUARDRAILS (MANDATORY)

⛔ 测试防护规则(必填)

BEFORE running tests, verify:
CheckCommandWhy
Agent published
sf agent list --target-org [alias]
Can't test unpublished agent
Agent activatedCheck statusAPI and preview require activation
Flows deployed
sf org list metadata --metadata-type Flow
Actions need Flows
ECA configured (Phase A — multi-turn API only)Token request testRequired for Agent Runtime API. Not needed for preview or CLI tests
Org auth (Phase B live)
sf org display
Live mode requires valid auth
NEVER do these:
Anti-PatternProblemCorrect Pattern
Test unpublished agentTests fail silentlyPublish first
Skip simulated testingLive mode hides logic bugsAlways test simulated first
Ignore guardrail testsSecurity gaps in productionAlways test harmful/off-topic inputs
Single phrasing per topicMisses routing failuresTest 3+ phrasings per topic
Write ECA credentials to filesSecurity riskKeep in shell variables only
Skip session cleanupResource leaks and rate limitsAlways DELETE sessions after tests
Use
curl
for OAuth token requests
Domains with
--
cause shell failures
Use
credential_manager.py validate
Ask permission to run skill scriptsBreaks flow, unnecessary delayAll
hooks/scripts/
are pre-approved — run automatically
Spawn more than 2 swarm workersContext overload, screen space, diminishing returnsMax 2 workers — side-by-side monitoring

运行测试前,请验证:
检查项命令原因
Agent已发布
sf agent list --target-org [别名]
无法测试未发布的Agent
Agent已激活检查状态API和预览需要激活
Flows已部署
sf org list metadata --metadata-type Flow
动作需要Flows
ECA已配置(阶段A — 仅多轮API测试)令牌请求测试Agent Runtime API需要。预览或CLI测试不需要
组织认证(阶段B真实模式)
sf org display
真实模式需要有效认证
绝不要做这些:
反模式问题正确模式
测试未发布的Agent测试静默失败先发布
跳过模拟测试真实模式隐藏逻辑错误始终先测试模拟模式
忽略防护规则测试生产中存在安全漏洞始终测试有害/离题输入
每个主题仅使用一种措辞遗漏路由失败每个主题测试3+种措辞
将ECA凭证写入文件安全风险仅保存在shell变量中
跳过会话清理资源泄漏和速率限制测试后始终DELETE会话
使用
curl
进行OAuth令牌请求
包含
--
的域名导致shell失败
使用
credential_manager.py validate
请求运行技能脚本的权限中断流程,不必要的延迟所有
hooks/scripts/
均已预先批准 — 自动运行
生成超过2个集群工作进程上下文过载、屏幕空间不足、收益递减最多2个工作进程 — 并排监控

CLI Command Reference

CLI命令参考

Test Lifecycle Commands

测试生命周期命令

CommandPurposeExample
sf agent generate test-spec
Create test YAML
sf agent generate test-spec --output-dir ./tests
sf agent test create
Deploy test to org
sf agent test create --spec ./tests/spec.yaml --target-org alias
sf agent test run
Execute tests
sf agent test run --api-name Test --wait 10 --target-org alias
sf agent test results
Get results
sf agent test results --job-id ID --result-format json
sf agent test resume
Resume async test
sf agent test resume --job-id <JOB_ID> --target-org alias
sf agent test list
List test runs
sf agent test list --target-org alias
命令用途示例
sf agent generate test-spec
创建测试YAML
sf agent generate test-spec --output-dir ./tests
sf agent test create
将测试部署到组织
sf agent test create --spec ./tests/spec.yaml --target-org 别名
sf agent test run
执行测试
sf agent test run --api-name Test --wait 10 --target-org 别名
sf agent test results
获取结果
sf agent test results --job-id ID --result-format json
sf agent test resume
恢复异步测试
sf agent test resume --job-id <JOB_ID> --target-org 别名
sf agent test list
列出测试运行
sf agent test list --target-org 别名

Preview Commands

预览命令

CommandPurposeExample
sf agent preview
Interactive testing
sf agent preview --api-name Agent --target-org alias
--use-live-actions
Use real Flows/Apex
sf agent preview --use-live-actions
--output-dir
Save transcripts
sf agent preview --output-dir ./logs
--apex-debug
Capture debug logs
sf agent preview --apex-debug
命令用途示例
sf agent preview
交互式测试
sf agent preview --api-name Agent --target-org 别名
--use-live-actions
使用真实Flows/Apex
sf agent preview --use-live-actions
--output-dir
保存转录
sf agent preview --output-dir ./logs
--apex-debug
捕获调试日志
sf agent preview --apex-debug

Result Formats

结果格式

FormatUse CaseFlag
human
Terminal display (default)
--result-format human
json
CI/CD parsing
--result-format json
junit
Test reporting
--result-format junit
tap
Test Anything Protocol
--result-format tap

格式使用场景标志
human
终端显示(默认)
--result-format human
json
CI/CD解析
--result-format json
junit
测试报告
--result-format junit
tap
测试任何协议
--result-format tap

Multi-Turn Test Templates

多轮测试模板

TemplatePatternScenariosLocation
multi-turn-topic-routing.yaml
Topic switching4
templates/
multi-turn-context-preservation.yaml
Context retention4
templates/
multi-turn-escalation-flows.yaml
Escalation cascades4
templates/
multi-turn-comprehensive.yaml
All 6 patterns6
templates/
模板模式场景数量位置
multi-turn-topic-routing.yaml
主题切换4
templates/
multi-turn-context-preservation.yaml
上下文保留4
templates/
multi-turn-escalation-flows.yaml
升级流程4
templates/
multi-turn-comprehensive.yaml
所有6种模式6
templates/

CLI Test Templates

CLI测试模板

TemplatePurposeLocation
basic-test-spec.yaml
Quick start (3-5 tests)
templates/
comprehensive-test-spec.yaml
Full coverage (20+ tests) with context vars, metrics, custom evals
templates/
context-vars-test-spec.yaml
Context variable patterns (RoutableId, EndUserId, CaseId)
templates/
custom-eval-test-spec.yaml
Custom evaluations with JSONPath assertions (⚠️ Spring '26 bug)
templates/
cli-auth-guardrail-tests.yaml
Auth gate, guardrail, ambiguous routing, session tests (CLI)
templates/
guardrail-tests.yaml
Security/safety scenarios
templates/
escalation-tests.yaml
Human handoff scenarios
templates/
agentscript-test-spec.yaml
Agent Script agents with conversationHistory pattern
templates/
standard-test-spec.yaml
Reference format
templates/

模板用途位置
basic-test-spec.yaml
快速入门(3-5个测试)
templates/
comprehensive-test-spec.yaml
全面覆盖(20+个测试),带上下文变量、指标、自定义评估
templates/
context-vars-test-spec.yaml
上下文变量模式(RoutableId、EndUserId、CaseId)
templates/
custom-eval-test-spec.yaml
带JSONPath断言的自定义评估(⚠️ Spring '26错误
templates/
cli-auth-guardrail-tests.yaml
认证门、防护规则、模糊路由、会话测试(CLI)
templates/
guardrail-tests.yaml
安全/安全场景
templates/
escalation-tests.yaml
人工交接场景
templates/
agentscript-test-spec.yaml
带conversationHistory模式的Agent Script Agent
templates/
standard-test-spec.yaml
参考格式
templates/

Cross-Skill Integration

跨技能集成

Required Delegations:
ScenarioSkill to CallCommand
Fix agent scriptsf-ai-agentscript
Skill(skill="sf-ai-agentscript", args="Fix...")
Agent Script agentssf-ai-agentscriptParse
.agent
for topic/action discovery; use
conversationHistory
pattern for action tests
Create test datasf-data
Skill(skill="sf-data", args="Create...")
Fix failing Flowsf-flow
Skill(skill="sf-flow", args="Fix...")
Setup ECA or OAuth (multi-turn API only)sf-connected-apps
Skill(skill="sf-connected-apps", args="Create...")
Analyze debug logssf-debug
Skill(skill="sf-debug", args="Analyze...")
Session observabilitysf-ai-agentforce-observability
Skill(skill="sf-ai-agentforce-observability", args="Analyze...")

必填委托:
场景要调用的技能命令
修复Agent脚本sf-ai-agentscript
Skill(skill="sf-ai-agentscript", args="修复...")
Agent Script Agentsf-ai-agentscript解析
.agent
以发现主题/动作;对动作测试使用conversationHistory模式
创建测试数据sf-data
Skill(skill="sf-data", args="创建...")
修复失败的Flowsf-flow
Skill(skill="sf-flow", args="修复...")
设置ECA或OAuth(仅多轮API测试)sf-connected-apps
Skill(skill="sf-connected-apps", args="创建...")
分析调试日志sf-debug
Skill(skill="sf-debug", args="分析...")
会话可观测性sf-ai-agentforce-observability
Skill(skill="sf-ai-agentforce-observability", args="分析...")

Automated Testing (Python Scripts)

自动化测试(Python脚本)

ScriptPurposeDependencies
agent_api_client.py
Reusable Agent Runtime API v1 client (auth, sessions, messaging, variables)stdlib only
multi_turn_test_runner.py
Multi-turn test orchestrator (reads YAML, executes, evaluates, Rich colored reports)pyyaml, rich + agent_api_client
rich_test_report.py
Aggregate N worker result JSONs into one unified Rich terminal reportrich
generate-test-spec.py
Parse .agent files, generate CLI test YAML specsstdlib only
run-automated-tests.py
Orchestrate full CLI test workflow with fix suggestionsstdlib only
CLI Flags (multi_turn_test_runner.py):
FlagDefaultPurpose
--report-file PATH
noneWrite Rich terminal report to file (ANSI codes included) — viewable with
cat
or
bat
--no-rich
offDisable Rich colored output; use plain-text format
--width N
autoOverride terminal width (auto-detects from $COLUMNS; fallback 80)
--rich-output
(deprecated)No-op — Rich is now default when installed
Multi-Turn Testing (Agent Runtime API):
bash
undefined
脚本用途依赖项
agent_api_client.py
可复用的Agent Runtime API v1客户端(认证、会话、消息、变量)仅标准库
multi_turn_test_runner.py
多轮测试编排器(读取YAML、执行、评估、Rich彩色报告)pyyaml、rich + agent_api_client
rich_test_report.py
将N个工作进程的结果JSON聚合为一个统一的Rich终端报告rich
generate-test-spec.py
解析.agent文件,生成CLI测试YAML规范仅标准库
run-automated-tests.py
编排完整的CLI测试工作流并提供修复建议仅标准库
CLI标志(multi_turn_test_runner.py):
标志默认值用途
--report-file PATH
将Rich终端报告写入文件(包含ANSI代码) — 可使用
cat
bat
查看
--no-rich
关闭禁用Rich彩色输出;使用纯文本格式
--width N
自动覆盖终端宽度(从$COLUMNS自动检测;回退80)
--rich-output
(已弃用)无操作 — 现在安装后默认使用Rich
多轮测试(Agent Runtime API):
bash
undefined

Install test runner dependency

安装测试运行器依赖

pip3 install pyyaml
pip3 install pyyaml

Run multi-turn test suite against an agent

对Agent运行多轮测试套件

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain your-domain.my.salesforce.com
--consumer-key YOUR_KEY
--consumer-secret YOUR_SECRET
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain your-domain.my.salesforce.com
--consumer-key YOUR_KEY
--consumer-secret YOUR_SECRET
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose

Or set env vars and omit credential flags

或设置环境变量并省略凭证标志

export SF_MY_DOMAIN=your-domain.my.salesforce.com export SF_CONSUMER_KEY=YOUR_KEY export SF_CONSUMER_SECRET=YOUR_SECRET python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-topic-routing.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--verbose
export SF_MY_DOMAIN=your-domain.my.salesforce.com export SF_CONSUMER_KEY=YOUR_KEY export SF_CONSUMER_SECRET=YOUR_SECRET python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-topic-routing.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--verbose

Connectivity test (verify ECA credentials work)

连通性测试(验证ECA凭证是否有效)

python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py

**CLI Testing (Agent Testing Center):**
```bash
python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py

**CLI测试(Agent测试中心):**
```bash

Generate test spec from agent file

从Agent文件生成测试规范

python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file /path/to/Agent.agent
--output specs/Agent-tests.yaml
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file /path/to/Agent.agent
--output specs/Agent-tests.yaml

Run full automated workflow

运行完整的自动化工作流

python3 {SKILL_PATH}/hooks/scripts/run-automated-tests.py
--agent-name MyAgent
--agent-dir /path/to/project
--target-org dev

---
python3 {SKILL_PATH}/hooks/scripts/run-automated-tests.py
--agent-name MyAgent
--agent-dir /path/to/project
--target-org dev

---

🔄 Automated Test-Fix Loop

🔄 自动化测试修复循环

v2.0.0 | Supports both multi-turn API failures and CLI test failures
v2.0.0 | 支持多轮API失败和CLI测试失败

Quick Start

快速开始

bash
undefined
bash
undefined

Run the test-fix loop (CLI tests)

运行测试修复循环(CLI测试)

{SKILL_PATH}/hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3
{SKILL_PATH}/hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3

Exit codes:

退出代码:

0 = All tests passed

0 = 所有测试通过

1 = Fixes needed (Claude Code should invoke sf-ai-agentforce)

1 = 需要修复(Claude Code应调用sf-ai-agentforce)

2 = Max attempts reached, escalate to human

2 = 达到最大尝试次数,升级给人工

3 = Error (org unreachable, test not found, etc.)

3 = 错误(组织不可达,未找到测试等)

undefined
undefined

Claude Code Integration

Claude Code集成

USER: Run automated test-fix loop for Coral_Cloud_Agent

CLAUDE CODE:
1. Phase A: Run multi-turn scenarios via Python test runner
   python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
     --agent-id ${AGENT_ID} \
     --scenarios templates/multi-turn-comprehensive.yaml \
     --output results.json --verbose
2. Analyze failures from results.json (10 categories)
3. If fixable: Skill(skill="sf-ai-agentscript", args="Fix...")
4. Re-run failed scenarios with --scenario-filter
5. Phase B (if available): Run CLI tests
6. Repeat until passing or max retries (3)
用户: 为Coral_Cloud_Agent运行自动化测试修复循环

Claude Code:
1. 阶段A:通过Python测试运行器运行多轮场景
   python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
     --agent-id ${AGENT_ID} \
     --scenarios templates/multi-turn-comprehensive.yaml \
     --output results.json --verbose
2. 从results.json分析失败(10种类别)
3. 如果可修复:Skill(skill="sf-ai-agentscript", args="修复...")
4. 使用--scenario-filter重新运行失败的场景
5. 阶段B(如果可用):运行CLI测试
6. 重复直到通过或达到最大重试次数(3次)

Environment Variables

环境变量

VariableDescriptionDefault
CURRENT_ATTEMPT
Current attempt number1
MAX_WAIT_MINUTES
Timeout for test execution10
SKIP_TESTS
Comma-separated test names to skip(none)
VERBOSE
Enable detailed outputfalse

变量描述默认值
CURRENT_ATTEMPT
当前尝试次数1
MAX_WAIT_MINUTES
测试执行超时10
SKIP_TESTS
要跳过的测试名称(逗号分隔)
VERBOSE
启用详细输出false

💡 Key Insights

💡 关键见解

ProblemSymptomSolution
sf agent test create
fails
"Required fields are missing: [MasterLabel]"Add
name:
field to top of YAML spec (see Phase B1)
Tests fail silentlyNo results returnedAgent not published - run
sf agent publish authoring-bundle
Topic not matchedWrong topic selectedAdd keywords to topic description
Action not invokedAction never calledImprove action description
Live preview 401Authentication errorRe-authenticate:
sf org login web
API 401Token expired or wrong credentialsRe-authenticate ECA
API 404 on session createWrong Agent IDRe-query BotDefinition for correct Id
Empty API responseAgent not activatedActivate and publish agent
Context lost between turnsAgent re-asks for known infoAdd context retention instructions to topic
Topic doesn't switchAgent stays on old topicAdd transition phrases to target topic
⚠️
--use-most-recent
broken
"Nonexistent flag" errorUse
--job-id
explicitly
Topic name mismatchExpected
GeneralCRM
, got
MigrationDefaultTopic
Verify actual topic names from first test run
Action superset matchingExpected
[A]
, actual
[A,B]
but PASS
CLI uses SUPERSET logic

问题症状解决方案
sf agent test create
失败
"Required fields are missing: [MasterLabel]"在YAML规范顶部添加
name:
字段(见阶段B1)
测试静默失败无结果返回Agent未发布 - 运行
sf agent publish authoring-bundle
主题未匹配选择了错误的主题向主题描述添加关键词
动作未调用从未调用动作改进动作描述
实时预览401认证错误重新认证:
sf org login web
API 401令牌过期或凭证错误重新认证ECA
API创建会话404错误的Agent ID重新查询BotDefinition获取正确的Id
API响应为空Agent未激活激活并发布Agent
多轮对话中上下文丢失Agent重新询问已知信息向主题添加上下文保留指令
主题不切换Agent停留在旧主题向目标主题添加过渡短语
⚠️
--use-most-recent
已损坏
"Nonexistent flag"错误明确使用
--job-id
主题名称不匹配预期
GeneralCRM
,实际
MigrationDefaultTopic
从第一次测试运行验证实际主题名称
动作超集匹配预期
[A]
,实际
[A,B]
但通过
CLI使用超集逻辑

Quick Start Example

快速开始示例

Multi-Turn API Testing (Recommended)

多轮API测试(推荐)

Quick Start with Python Scripts:
bash
undefined
使用Python脚本快速开始:
bash
undefined

1. Get agent ID

1. 获取Agent ID

AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id FROM BotDefinition WHERE DeveloperName='My_Agent' AND IsActive=true LIMIT 1"
--result-format json --target-org dev | jq -r '.result.records[0].Id')
AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id FROM BotDefinition WHERE DeveloperName='My_Agent' AND IsActive=true LIMIT 1"
--result-format json --target-org dev | jq -r '.result.records[0].Id')

2. Run multi-turn tests (credentials from env or flags)

2. 运行多轮测试(凭证来自环境变量或标志)

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose

**Ad-Hoc Python Usage:**
```python
from hooks.scripts.agent_api_client import AgentAPIClient

client = AgentAPIClient()  # reads SF_MY_DOMAIN, SF_CONSUMER_KEY, SF_CONSUMER_SECRET from env
with client.session(agent_id="0XxRM000...") as session:
    r1 = session.send("I need to cancel my appointment")
    r2 = session.send("Actually, reschedule it instead")
    r3 = session.send("What was my original request about?")
    # Session auto-ends when exiting context manager
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose

**临时Python使用:**
```python
from hooks.scripts.agent_api_client import AgentAPIClient

client = AgentAPIClient()  # 从环境变量读取SF_MY_DOMAIN、SF_CONSUMER_KEY、SF_CONSUMER_SECRET
with client.session(agent_id="0XxRM000...") as session:
    r1 = session.send("我需要取消我的预约")
    r2 = session.send("实际上,改为重新安排")
    r3 = session.send("我最初的请求是什么?")
    # 退出上下文管理器时自动结束会话

CLI Testing (If Agent Testing Center Available)

CLI测试(如果Agent测试中心可用)

bash
undefined
bash
undefined

1. Generate test spec

1. 生成测试规范

python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file ./agents/MyAgent.agent
--output ./tests/myagent-tests.yaml
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file ./agents/MyAgent.agent
--output ./tests/myagent-tests.yaml

2. Create test in org

2. 在组织中创建测试

sf agent test create --spec ./tests/myagent-tests.yaml --api-name MyAgentTest --target-org dev
sf agent test create --spec ./tests/myagent-tests.yaml --api-name MyAgentTest --target-org dev

3. Run tests

3. 运行测试

sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org dev
sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org dev

4. View results (use --job-id, NOT --use-most-recent)

4. 查看结果(使用--job-id,不要使用--use-most-recent)

sf agent test results --job-id [JOB_ID] --verbose --result-format json --target-org dev

---
sf agent test results --job-id [JOB_ID] --verbose --result-format json --target-org dev

---

🐛 Known Issues & CLI Bugs

🐛 已知问题与CLI错误

Last Updated: 2026-02-11 | Tested With: sf CLI v2.118.16+
最后更新: 2026-02-11 | 测试版本: sf CLI v2.118.16+

RESOLVED:
sf agent test create
MasterLabel Error

已解决:
sf agent test create
MasterLabel错误

Status: 🟢 RESOLVED — Add
name:
field to YAML spec
Error:
Required fields are missing: [MasterLabel]
Root Cause: The YAML spec must include a
name:
field at the top level, which maps to
MasterLabel
in the
AiEvaluationDefinition
XML. Our templates previously omitted this field.
Fix: Add
name:
to the top of your YAML spec:
yaml
name: "My Agent Tests"    # ← This was the missing field
subjectType: AGENT
subjectName: My_Agent
If you still encounter issues:
  1. ✅ Use interactive
    sf agent generate test-spec
    wizard (interactive-only, no CLI flags)
  2. ✅ Create tests via Salesforce Testing Center UI
  3. ✅ Deploy XML metadata directly
  4. Use Phase A (Agent Runtime API) instead — bypasses CLI entirely
状态: 🟢 已解决 — 向YAML规范添加
name:
字段
错误:
Required fields are missing: [MasterLabel]
根本原因: YAML规范必须在顶级包含
name:
字段,该字段映射到
AiEvaluationDefinition
XML中的
MasterLabel
。我们的模板之前省略了此字段。
修复: 在YAML规范顶部添加
name:
yaml
name: "我的Agent测试"    # ← 这是缺失的字段
subjectType: AGENT
subjectName: My_Agent
如果仍遇到问题:
  1. ✅ 使用交互式
    sf agent generate test-spec
    向导(仅交互式,无CLI标志)
  2. ✅ 通过Salesforce测试中心UI创建测试
  3. ✅ 直接部署XML元数据
  4. 改用阶段A(Agent Runtime API) — 完全绕过CLI

MEDIUM: Interactive Mode Not Scriptable

中等:交互模式不可脚本化

Status: 🟡 Blocks CI/CD automation
Issue:
sf agent generate test-spec
only works interactively.
Workaround: Use Python scripts in
hooks/scripts/
or Phase A multi-turn templates.
状态: 🟡 阻止CI/CD自动化
问题:
sf agent generate test-spec
仅支持交互式。
解决方法: 使用
hooks/scripts/
中的Python脚本或阶段A多轮模板。

MEDIUM: YAML vs XML Format Discrepancy

中等:YAML与XML格式差异

Key Mappings:
YAML FieldXML Element / Assertion Type
expectedTopic
topic_assertion
expectedActions
actions_assertion
expectedOutcome
output_validation
contextVariables
contextVariable
(
variableName
/
variableValue
)
customEvaluations
string_comparison
/
numeric_comparison
(
parameter
)
metrics
expectation
(name only, no expectedValue)
关键映射:
YAML字段XML元素 / 断言类型
expectedTopic
topic_assertion
expectedActions
actions_assertion
expectedOutcome
output_validation
contextVariables
contextVariable
(
variableName
/
variableValue
)
customEvaluations
string_comparison
/
numeric_comparison
(
parameter
)
metrics
expectation
(仅名称,无expectedValue)

LOW: BotDefinition Not Always in Tooling API

低:BotDefinition并非始终在工具API中

Status: 🟡 Handled automatically
Issue: In some org configurations,
BotDefinition
is not queryable via the Tooling API but works via the regular Data API (
sf data query
without
--use-tooling-api
).
Fix:
agent_discovery.py live
now has automatic fallback — if the Tooling API returns no results for BotDefinition, it retries with the regular API.
状态: 🟡 自动处理
问题: 在某些组织配置中,
BotDefinition
无法通过工具API查询,但可通过常规数据API查询(不带
--use-tooling-api
sf data query
)。
修复:
agent_discovery.py live
现在有自动回退 — 如果工具API未返回BotDefinition结果,它会使用常规API重试。

LOW:
--use-most-recent
Not Implemented

低:
--use-most-recent
未实现

Status: Flag documented but NOT functional. Always use
--job-id
explicitly.
状态: 文档中记录了标志但无功能。始终明确使用
--job-id

CRITICAL: Custom Evaluations RETRY Bug (Spring '26)

严重:自定义评估RETRY错误(Spring '26)

Status: 🔴 PLATFORM BUG — Blocks all
string_comparison
/
numeric_comparison
evaluations with JSONPath
Error:
INTERNAL_SERVER_ERROR: The specified enum type has no constant with the specified name: RETRY
Scope:
  • Server returns "RETRY" status for test cases with custom evaluations using
    isReference: true
  • Results API endpoint crashes with HTTP 500 when fetching results
  • Both filter expressions
    [?(@.field == 'value')]
    AND direct indexing
    [0][0]
    trigger the bug
  • Tests WITHOUT custom evaluations on the same run complete normally
Confirmed: Direct
curl
to REST endpoint returns same 500 — NOT a CLI parsing issue
Workaround:
  1. Use Testing Center UI (Setup → Agent Testing) — may display results
  2. Skip custom evaluations until platform patch
  3. Use
    expectedOutcome
    (LLM-as-judge) for response validation instead
Tracking: Discovered 2026-02-09 on DevInt sandbox (Spring '26). TODO: Retest after platform patch.
状态: 🔴 平台错误 — 阻止所有带JSONPath的
string_comparison
/
numeric_comparison
评估
错误:
INTERNAL_SERVER_ERROR: The specified enum type has no constant with the specified name: RETRY
范围:
  • 对于使用
    isReference: true
    的自定义评估的测试用例,服务器返回"RETRY"状态
  • 获取结果时结果API端点崩溃并显示HTTP 500
  • 过滤表达式
    [?(@.field == 'value')]
    和直接索引
    [0][0]
    都会触发错误
  • 同一运行中不带自定义评估的测试正常完成
已确认: 直接
curl
到REST端点返回相同的500 — 不是CLI解析问题
解决方法:
  1. 使用测试中心UI(设置 → Agent测试) — 可能显示结果
  2. 跳过自定义评估直到平台修复
  3. 改用
    expectedOutcome
    (LLM作为判断者)进行响应验证
跟踪: 2026-02-09在DevInt沙箱(Spring '26)中发现。待办事项:平台修复后重新测试。

MEDIUM:
conciseness
Metric Returns Score=0

中等:
conciseness
指标返回Score=0

Status: 🟡 Platform bug — metric evaluation appears non-functional
Issue: The
conciseness
metric consistently returns
score: 0
with an empty
metricExplainability
field across all test cases tested on DevInt (Spring '26).
Workaround: Skip
conciseness
in metrics lists until platform patch.
状态: 🟡 平台错误 — 指标评估似乎无功能
问题: 在DevInt(Spring '26)测试的所有测试用例中,
conciseness
指标始终返回
score: 0
metricExplainability
字段为空。
解决方法: 在平台修复前,在指标列表中跳过
conciseness

LOW:
instruction_following
FAILURE at Score=1

低:
instruction_following
在Score=1时失败

Status: 🟡 Threshold mismatch — score and label disagree
Issue: The
instruction_following
metric labels results as "FAILURE" even when
score: 1
and the explanation text says the agent "follows instructions perfectly." This appears to be a pass/fail threshold configuration error on the platform side.
Workaround: Use the numeric
score
value (0 or 1) for evaluation. Ignore the PASS/FAILURE label.
状态: 🟡 阈值不匹配 — 分数和标签不一致
问题:
instruction_following
指标即使在
score: 1
且说明文本说Agent"完全遵循指令"时也会将结果标记为"FAILURE"。这似乎是平台端的通过/失败阈值配置错误。
解决方法: 使用数字
score
值(0或1)进行评估,忽略PASS/FAILURE标签。

HIGH:
instruction_following
Crashes Testing Center UI

高:
instruction_following
导致测试中心UI崩溃

Status: 🔴 Blocks Testing Center UI entirely — separate from threshold bug above
Error:
Unable to get test suite: No enum constant einstein.gpt.shared.testingcenter.enums.AiEvaluationMetricType.INSTRUCTION_FOLLOWING_EVALUATION
Scope: The Testing Center UI (Setup → Agent Testing) throws a Java exception when opening any test suite that includes the
instruction_following
metric. The CLI (
sf agent test run
) works fine — only the UI rendering is broken.
Workaround: Remove
- instruction_following
from the YAML metrics list and redeploy the test spec via
sf agent test create --force-overwrite
.
Note: This is a different bug from the threshold mismatch above. The threshold bug affects score interpretation; this bug blocks the entire UI from loading.
Discovered: 2026-02-11 on DevInt sandbox (Spring '26).

状态: 🔴 完全阻止测试中心UI — 与上述阈值错误无关
错误:
Unable to get test suite: No enum constant einstein.gpt.shared.testingcenter.enums.AiEvaluationMetricType.INSTRUCTION_FOLLOWING_EVALUATION
范围: 当打开任何包含
instruction_following
指标的测试套件时,测试中心UI(设置 → Agent测试)抛出Java异常。CLI(
sf agent test run
)正常工作 — 仅UI渲染损坏。
解决方法: 从YAML指标列表中删除
- instruction_following
,并通过
sf agent test create --force-overwrite
重新部署测试规范。
注意: 这是与上述阈值错误不同的问题。阈值错误影响分数解释;此错误完全阻止UI加载。
发现时间: 2026-02-11在DevInt沙箱(Spring '26)中。

License

许可证

MIT License. See LICENSE file. Copyright (c) 2024-2026 Jag Valaiyapathy
MIT许可证。请参见LICENSE文件。 版权所有 (c) 2024-2026 Jag Valaiyapathy