momentic-result-classification
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMomentic result classification (MCP)
Momentic 测试结果分类(MCP)
Momentic is an end-to-end testing framework where each test is composed of browser interaction steps. Each step combines Momentic-specific behavior (AI checks, natural-language locators, ai actions, etc.) with Playwright capabilities wrapped in our YAML step schema. When these tests are run, they produce results data that can be used to analyze the outcome of the test. The results data contains metadata about the run as well as any assets generated by the run (e.g. screenshots, logs, network requests, video recordings, etc.). Your job is to use these test results to classify failures that occurred in Momentic test runs.
Momentic是一款端到端测试框架,每一项测试都由一系列浏览器交互步骤构成。每个步骤既包含Momentic特有的功能(AI检查、自然语言定位器、AI操作等),也包含封装在我们YAML步骤schema中的Playwright能力。这些测试运行后会生成结果数据,可用于分析测试的执行结果。结果数据包含运行相关的元数据,以及运行过程中生成的所有资源(例如截图、日志、网络请求、视频录制文件等)。你的任务是利用这些测试结果,对Momentic测试运行过程中出现的失败情况进行分类。
Instructions
使用说明
- Given a failing test run, analyze why the test run failed. Often you'll need to look beyond the current run to understand this, looking at past runs of the same test, or other context provided by the Momentic MCP tools
- After analyzing why the run failed, bucket the failure into one of the below categories, explaining the reasoning for choosing the specific category.
- 针对失败的测试运行,分析其失败原因。通常你需要跳出当前运行的信息来排查问题,比如查看同一项测试的历史运行记录,或Momentic MCP工具提供的其他上下文信息
- 分析出运行失败原因后,将失败情况归入以下某一类别,并说明选择该类别的理由。
Helpful MCP tools
实用MCP工具
momentic_get_runmomentic_list_runsmomentic_get_runmomentic_list_runsBackground
背景知识
Test run result structure
测试运行结果结构
When momentic tests are run via the CLI, the results are stored in a "run group". The data for this run group is stored in a single directory within the momentic project. By default, the directory is called , but can be changed in momentic project settings or on a single run of a run group. The run group results folder has the following structure:
test-resultstest-results/
├── metadata.json data about the run group, including git metadata and timing info.
└── runs/ On zip for each test run in the run group.
├── <runId_1>.zip a zipped run directory containing data about this specific test run. Follows the structure described below.
└── <runId_2>.zipWhen unzipped, run directories have the following structure:
<runId>/
├── metadata.json run-level metadata.
└── attempts/<n>/ one folder per attempt (1-based n).
├── metadata.json attempt outcome and step results.
├── console.json optional browser console output.
└── assets/
├── <snapshotId>.jpeg before/after screenshot for each step (see attempt metadata.json for snapshot ID).
├── <snapshotId>.html before/after DOM snapshot for each step (see attempt metadata.json for snapshot ID).
├── har-pages.log HAR pages (ndjson).
├── har-entries.log HAR network entries (ndjson).
├── resource-usage.ndjson CPU/memory samples taken during the attempt.
├── <videoName> video recording (when video recording is enabled).
└── browser-crash.zip browser crash dump (only present on crash).When getting run results via the momentic MCP, tools such as will return links to the MCP working directory (default ). This directory will contain unzipped run result folders, following the structure above, named .
momentic_get_run.momentic-mcprun-result-<runId>通过CLI运行Momentic测试时,结果会存储在一个“运行组”中。运行组的数据存储在Momentic项目下的单个目录中,默认目录名为,你也可以在Momentic项目设置中修改该路径,或是针对单次运行组的执行修改路径。运行组结果文件夹的结构如下:
test-resultstest-results/
├── metadata.json data about the run group, including git metadata and timing info.
└── runs/ On zip for each test run in the run group.
├── <runId_1>.zip a zipped run directory containing data about this specific test run. Follows the structure described below.
└── <runId_2>.zip解压后,运行目录的结构如下:
<runId>/
├── metadata.json run-level metadata.
└── attempts/<n>/ one folder per attempt (1-based n).
├── metadata.json attempt outcome and step results.
├── console.json optional browser console output.
└── assets/
├── <snapshotId>.jpeg before/after screenshot for each step (see attempt metadata.json for snapshot ID).
├── <snapshotId>.html before/after DOM snapshot for each step (see attempt metadata.json for snapshot ID).
├── har-pages.log HAR pages (ndjson).
├── har-entries.log HAR network entries (ndjson).
├── resource-usage.ndjson CPU/memory samples taken during the attempt.
├── <videoName> video recording (when video recording is enabled).
└── browser-crash.zip browser crash dump (only present on crash).通过Momentic MCP获取运行结果时,等工具会返回指向MCP工作目录(默认为)的链接。该目录下会包含解压后的运行结果文件夹,命名规则为,遵循上述结构。
momentic_get_run.momentic-mcprun-result-<runId>Steps snapshot
步骤快照
The metadata.json file includes a property which shows the state of the test steps at the time of execution. Use this property if you suspect that the test has changed between runs, or to validate that the test has been setup properly
stepsSnapshotmetadata.json文件包含属性,展示了执行时测试步骤的状态。如果你怀疑不同运行之间测试本身发生了变更,或是需要验证测试的配置是否正确,可以使用该属性。
stepsSnapshotElement locators
元素定位器
Certain step types that interact with elements have a "target" property, or locator, that specifies which element the step should interact with.
某些用于与元素交互的步骤类型包含“target”属性,也称为定位器,用于指定步骤需要交互的元素。
Locator caches
定位器缓存
Locators identify elements by sending the page state html/xml to an llm as well as a screenshot. The llm identifies which element on the page the user is referring to. Momentic will attempt to "cache" the answer from the llm so that future calls don't require AI calls. On future runs, the page state is checked against the cached element to determine whether the element is still usable, or the page has changed enough such that another AI call is required.
A locator cache can bust for a variety of reasons:
- the element description has changed, in which case we'll always bust the cache
- the cached element could not be located in the current page state
- the cached element was located in the page state, but fails certain checks specified on the cache entry, such as requiring a certain position, shape, or content.
You can find the on the property in the results for a given step. The property is also listed on the results, showing the full cache saved for that element.
cacheBustReasontracecache定位器识别元素的逻辑是将页面状态的html/xml以及截图发送给llm,由llm识别用户指向的页面元素。Momentic会尝试“缓存”llm返回的结果,这样后续的调用就不需要再请求AI。后续运行时,系统会将页面状态与缓存的元素做对比,判断元素是否仍然可用,或是页面变化幅度较大,需要重新发起AI调用。
定位器缓存失效的原因有很多:
- 元素描述发生变更,这种情况下缓存一定会失效
- 无法在当前页面状态中找到缓存的元素
- 能在页面状态中找到缓存的元素,但不满足缓存条目上指定的某些检查条件,例如要求特定的位置、形状或内容。
你可以在对应步骤结果的属性中找到。结果中也会列出属性,展示为该元素保存的完整缓存内容。
tracecacheBustReasoncacheIdentifying bad caches
识别错误缓存
Sometimes the element that was cached is not the element that the user intended to target. This can cause failures or unexpected behaviors in tests. In these cases, it helps to verify exactly why the wrong cache was saved in the first place. Use the property of the on the incorrect cache to get the details of the original run, calling with this runId. This will return the run where the cache target was updated.
runIdtargetUpdateLoggerTagsmomentic_get_run有时缓存的元素并不是用户想要定位的元素,这会导致测试失败或出现异常行为。这种情况下,需要核实最初为什么会保存错误的缓存。你可以使用错误缓存上的属性,调用传入该runId,获取原始运行的详情,就能找到缓存目标被更新的那次运行记录。
targetUpdateLoggerTagsrunIdmomentic_get_runUsing past runs
历史运行记录的使用
You MUST look at past runs of the same test when understanding why a test failed. Looking at past runs helps you identify:
- When did this test start failing?
- What differed vs the last passing run?
- Did the same action behave differently on an earlier run?
Use step results and screenshots on past runs to answer these questions. Do NOT rely only on summaries from or to understand what happened in a test run. You MUST look at the specific run details, including step results and screenshots, to determine the behavior of past runs.
momentic_get_runmomentic_list_runsWhen looking at past runs, use the following workflow:
- Call the tool to identify the runs you want more detail on.
momentic_list_runs - Call for that specific run to get the run details.
momentic_get_run
排查测试失败原因时,你必须查看同一项测试的历史运行记录。查看历史运行记录可以帮助你确定:
- 这项测试是从什么时候开始失败的?
- 和上一次运行通过的版本相比,有什么差异?
- 同一个操作在更早的运行中是否表现不同?
你需要使用历史运行的步骤结果和截图来回答这些问题。不要只依赖或返回的摘要信息来判断测试运行的情况。你必须查看具体的运行详情,包括步骤结果和截图,才能确定历史运行的实际表现。
momentic_get_runmomentic_list_runs查看历史运行时,请遵循以下工作流:
- 调用工具,确定你需要获取更多详情的运行记录
momentic_list_runs - 针对特定运行调用,获取运行详情。
momentic_get_run
Multi-attempt runs
多次尝试的运行
When shows a passing run with , treat it as a partial failure worth investigating, not a clean passing run. Pull the first attempt's step results and failure messages to understand what was going wrong before the retry succeeded.
momentic_list_runsattempts > 1如果显示某次运行通过,但,请将其视为值得排查的部分失败,而不是完全正常的通过。你需要拉取第一次尝试的步骤结果和失败信息,了解重试成功之前出现的问题。
momentic_list_runsattempts > 1Flakiness and intermittent failures
不稳定性和偶发失败
- In order to consider a test flaky or failing intermittently, it must be intermittently failing for the same app and test behavior.
- Just because a test failed once does NOT mean that it's flaky - it could have failed because of an application change. You need to determine whether or not there was an application or test change between runs by analyzing the screenshots and/or browser state in the results.
- IMPORTANT: You cannot make assumptions about flakiness or intermittent failures without verifying whether there was an application or test change that caused the failure
- 只有当测试在应用和测试行为完全一致的情况下出现偶发失败,才能将其判定为不稳定或偶发失败。
- 测试仅失败一次并不代表它不稳定——失败可能是因为应用发生了变更。你需要通过分析结果中的截图和/或浏览器状态,判断不同运行之间是否存在应用或测试的变更。
- 重要提示:在没有验证失败是否由应用或测试变更导致的情况下,你不能直接判定测试存在不稳定性或偶发失败问题。
Test temporality
测试时效性
- Any past results may not necessarily match today’s test file. The test may have changed, meaning the result was on a different version of the test.
- Looking at the stepsSnapshot property of the attempt metadata.json can help you determine whether the test has changed.
- 任何历史运行结果都不一定和当前的测试文件匹配。测试可能已经发生变更,意味着历史结果对应的是另一个版本的测试。
- 查看尝试对应的metadata.json的stepsSnapshot属性可以帮助你判断测试是否发生了变更。
Identifying related vs unrelated issues
区分相关问题与无关问题
- Use test name and description to determine what the test is intending to verify
- Failures outside that intent are unrelated, otherwise consider them related.
- Any failures in setup or teardown steps are pretty much always considered unrelated
- 利用测试名称和描述判断测试的校验目标
- 超出校验目标的失败属于无关问题,否则视为相关问题
- 任何测试准备或收尾步骤的失败几乎都属于无关问题
Bug vs change
缺陷与变更的区分
- Bug: something very clearly went wrong when it shouldn't have, such as an error message appearing. It's obvious just by looking at a single step or two that this is a bug.
- Change: any other behavior changes in the application
- 缺陷:明显不应出现的问题,比如出现错误提示。仅查看一两个步骤就能明确判断是缺陷。
- 变更:应用出现的其他任何行为变化。
Formal classification output
正式分类输出
- Exactly one category id — no new labels, no multi-label.
- Ground your decision in data. Be sure that you've fully investigated the run before assigning the category.
text
Reasoning: <a few sentences tied to summary, past runs, and intent>
Category: <one id from the list>- 必须且仅能指定一个类别ID——不要新增标签,也不要打多个标签。
- 你的判断必须基于数据。在指定类别前,请确保你已经对运行情况做了全面排查。
text
Reasoning: <a few sentences tied to summary, past runs, and intent>
Category: <one id from the list>Category ids
类别ID
Use these strings verbatim:
- — Nothing failed; all attempts passed.
NO_FAILURE - — Related to intent; expectation drift / change, not a clear defect.
RELATED_APPLICATION_CHANGE - — Related to intent; clearly incorrect behavior.
RELATED_APPLICATION_BUG - — Outside intent; not a clear bug.
UNRELATED_APPLICATION_CHANGE - — Outside intent but clearly broken.
UNRELATED_APPLICATION_BUG - — Test/automation issue (race, vague locator or assertion).
TEST_CAN_BE_IMPROVED - — Rare or external (browser crash, resource pressure, rate limits, flaky environment).
INFRA - — Load/responsiveness (stuck spinner, assertion timeouts) when not pure infra.
PERFORMANCE - — There was an issue with momentic itself, the platform running the test (e.g. an AI hallucination, data issues, incorrectly redirecting to the wrong element).
MOMENTIC_ISSUE
请严格使用以下字符串:
- — 没有失败;所有尝试都运行通过。
NO_FAILURE - — 与测试目标相关;预期值漂移/变更,不属于明确缺陷。
RELATED_APPLICATION_CHANGE - — 与测试目标相关;明确的异常行为。
RELATED_APPLICATION_BUG - — 超出测试目标;不属于明确缺陷。
UNRELATED_APPLICATION_CHANGE - — 超出测试目标,但存在明确故障。
UNRELATED_APPLICATION_BUG - — 测试/自动化问题(竞态条件、模糊的定位器或断言)。
TEST_CAN_BE_IMPROVED - — 罕见或外部问题(浏览器崩溃、资源压力、限流、环境不稳定)。
INFRA - — 负载/响应性问题(加载动画卡住、断言超时),不属于纯基础设施问题。
PERFORMANCE - — Momentic本身或运行测试的平台存在问题(例如AI幻觉、数据问题、错误定位到错误元素)。
MOMENTIC_ISSUE