momentic-result-classification

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Momentic result classification (MCP)

Momentic 测试结果分类(MCP)

Momentic is an end-to-end testing framework where each test is composed of browser interaction steps. Each step combines Momentic-specific behavior (AI checks, natural-language locators, ai actions, etc.) with Playwright capabilities wrapped in our YAML step schema. When these tests are run, they produce results data that can be used to analyze the outcome of the test. The results data contains metadata about the run as well as any assets generated by the run (e.g. screenshots, logs, network requests, video recordings, etc.). Your job is to use these test results to classify failures that occurred in Momentic test runs.
Momentic是一款端到端测试框架,每一项测试都由一系列浏览器交互步骤构成。每个步骤既包含Momentic特有的功能(AI检查、自然语言定位器、AI操作等),也包含封装在我们YAML步骤schema中的Playwright能力。这些测试运行后会生成结果数据,可用于分析测试的执行结果。结果数据包含运行相关的元数据,以及运行过程中生成的所有资源(例如截图、日志、网络请求、视频录制文件等)。你的任务是利用这些测试结果,对Momentic测试运行过程中出现的失败情况进行分类。

Instructions

使用说明

  1. Given a failing test run, analyze why the test run failed. Often you'll need to look beyond the current run to understand this, looking at past runs of the same test, or other context provided by the Momentic MCP tools
  2. After analyzing why the run failed, bucket the failure into one of the below categories, explaining the reasoning for choosing the specific category.
  1. 针对失败的测试运行,分析其失败原因。通常你需要跳出当前运行的信息来排查问题,比如查看同一项测试的历史运行记录,或Momentic MCP工具提供的其他上下文信息
  2. 分析出运行失败原因后,将失败情况归入以下某一类别,并说明选择该类别的理由。

Helpful MCP tools

实用MCP工具

momentic_get_run
— Returns some metadata about the run and the path to the full run results. Use the metadata to help you parse through the run results (e.g. which attempt to look at, which step failed, etc.)
momentic_list_runs
— Recent runs for a test so you can compare the result of past runs over time. Always pass a branch name so that it's more likely you're looking at the same version of the test.
momentic_get_run
— 返回运行相关的元数据以及完整运行结果的存储路径。可利用这些元数据解析运行结果(例如需要查看哪次尝试、哪个步骤失败等)
momentic_list_runs
— 返回某一测试的近期运行记录,方便你对比不同时间的历史运行结果。调用时请务必传入分支名称,这样能确保你查看的是同一版本的测试的运行结果。

Background

背景知识

Test run result structure

测试运行结果结构

When momentic tests are run via the CLI, the results are stored in a "run group". The data for this run group is stored in a single directory within the momentic project. By default, the directory is called
test-results
, but can be changed in momentic project settings or on a single run of a run group. The run group results folder has the following structure:
test-results/
├── metadata.json         data about the run group, including git metadata and timing info.
└── runs/                 On zip for each test run in the run group.
    ├── <runId_1>.zip         a zipped run directory containing data about this specific test run.  Follows the structure described below.
    └── <runId_2>.zip
When unzipped, run directories have the following structure:
<runId>/
├── metadata.json           run-level metadata.
└── attempts/<n>/           one folder per attempt (1-based n).
    ├── metadata.json       attempt outcome and step results.
    ├── console.json        optional browser console output.
    └── assets/
        ├── <snapshotId>.jpeg     before/after screenshot for each step (see attempt metadata.json for snapshot ID).
        ├── <snapshotId>.html     before/after DOM snapshot for each step (see attempt metadata.json for snapshot ID).
        ├── har-pages.log         HAR pages (ndjson).
        ├── har-entries.log       HAR network entries (ndjson).
        ├── resource-usage.ndjson CPU/memory samples taken during the attempt.
        ├── <videoName>           video recording (when video recording is enabled).
        └── browser-crash.zip     browser crash dump (only present on crash).
When getting run results via the momentic MCP, tools such as
momentic_get_run
will return links to the MCP working directory (default
.momentic-mcp
). This directory will contain unzipped run result folders, following the structure above, named
run-result-<runId>
.
通过CLI运行Momentic测试时,结果会存储在一个“运行组”中。运行组的数据存储在Momentic项目下的单个目录中,默认目录名为
test-results
,你也可以在Momentic项目设置中修改该路径,或是针对单次运行组的执行修改路径。运行组结果文件夹的结构如下:
test-results/
├── metadata.json         data about the run group, including git metadata and timing info.
└── runs/                 On zip for each test run in the run group.
    ├── <runId_1>.zip         a zipped run directory containing data about this specific test run.  Follows the structure described below.
    └── <runId_2>.zip
解压后,运行目录的结构如下:
<runId>/
├── metadata.json           run-level metadata.
└── attempts/<n>/           one folder per attempt (1-based n).
    ├── metadata.json       attempt outcome and step results.
    ├── console.json        optional browser console output.
    └── assets/
        ├── <snapshotId>.jpeg     before/after screenshot for each step (see attempt metadata.json for snapshot ID).
        ├── <snapshotId>.html     before/after DOM snapshot for each step (see attempt metadata.json for snapshot ID).
        ├── har-pages.log         HAR pages (ndjson).
        ├── har-entries.log       HAR network entries (ndjson).
        ├── resource-usage.ndjson CPU/memory samples taken during the attempt.
        ├── <videoName>           video recording (when video recording is enabled).
        └── browser-crash.zip     browser crash dump (only present on crash).
通过Momentic MCP获取运行结果时,
momentic_get_run
等工具会返回指向MCP工作目录(默认为
.momentic-mcp
)的链接。该目录下会包含解压后的运行结果文件夹,命名规则为
run-result-<runId>
,遵循上述结构。

Steps snapshot

步骤快照

The metadata.json file includes a
stepsSnapshot
property which shows the state of the test steps at the time of execution. Use this property if you suspect that the test has changed between runs, or to validate that the test has been setup properly
metadata.json文件包含
stepsSnapshot
属性,展示了执行时测试步骤的状态。如果你怀疑不同运行之间测试本身发生了变更,或是需要验证测试的配置是否正确,可以使用该属性。

Element locators

元素定位器

Certain step types that interact with elements have a "target" property, or locator, that specifies which element the step should interact with.
某些用于与元素交互的步骤类型包含“target”属性,也称为定位器,用于指定步骤需要交互的元素。

Locator caches

定位器缓存

Locators identify elements by sending the page state html/xml to an llm as well as a screenshot. The llm identifies which element on the page the user is referring to. Momentic will attempt to "cache" the answer from the llm so that future calls don't require AI calls. On future runs, the page state is checked against the cached element to determine whether the element is still usable, or the page has changed enough such that another AI call is required.
A locator cache can bust for a variety of reasons:
  • the element description has changed, in which case we'll always bust the cache
  • the cached element could not be located in the current page state
  • the cached element was located in the page state, but fails certain checks specified on the cache entry, such as requiring a certain position, shape, or content.
You can find the
cacheBustReason
on the
trace
property in the results for a given step. The
cache
property is also listed on the results, showing the full cache saved for that element.
定位器识别元素的逻辑是将页面状态的html/xml以及截图发送给llm,由llm识别用户指向的页面元素。Momentic会尝试“缓存”llm返回的结果,这样后续的调用就不需要再请求AI。后续运行时,系统会将页面状态与缓存的元素做对比,判断元素是否仍然可用,或是页面变化幅度较大,需要重新发起AI调用。
定位器缓存失效的原因有很多:
  • 元素描述发生变更,这种情况下缓存一定会失效
  • 无法在当前页面状态中找到缓存的元素
  • 能在页面状态中找到缓存的元素,但不满足缓存条目上指定的某些检查条件,例如要求特定的位置、形状或内容。
你可以在对应步骤结果的
trace
属性中找到
cacheBustReason
。结果中也会列出
cache
属性,展示为该元素保存的完整缓存内容。

Identifying bad caches

识别错误缓存

Sometimes the element that was cached is not the element that the user intended to target. This can cause failures or unexpected behaviors in tests. In these cases, it helps to verify exactly why the wrong cache was saved in the first place. Use the
runId
property of the
targetUpdateLoggerTags
on the incorrect cache to get the details of the original run, calling
momentic_get_run
with this runId. This will return the run where the cache target was updated.
有时缓存的元素并不是用户想要定位的元素,这会导致测试失败或出现异常行为。这种情况下,需要核实最初为什么会保存错误的缓存。你可以使用错误缓存上
targetUpdateLoggerTags
runId
属性,调用
momentic_get_run
传入该runId,获取原始运行的详情,就能找到缓存目标被更新的那次运行记录。

Using past runs

历史运行记录的使用

You MUST look at past runs of the same test when understanding why a test failed. Looking at past runs helps you identify:
  • When did this test start failing?
  • What differed vs the last passing run?
  • Did the same action behave differently on an earlier run?
Use step results and screenshots on past runs to answer these questions. Do NOT rely only on summaries from
momentic_get_run
or
momentic_list_runs
to understand what happened in a test run. You MUST look at the specific run details, including step results and screenshots, to determine the behavior of past runs.
When looking at past runs, use the following workflow:
  1. Call the
    momentic_list_runs
    tool to identify the runs you want more detail on.
  2. Call
    momentic_get_run
    for that specific run to get the run details.
排查测试失败原因时,你必须查看同一项测试的历史运行记录。查看历史运行记录可以帮助你确定:
  • 这项测试是从什么时候开始失败的?
  • 和上一次运行通过的版本相比,有什么差异?
  • 同一个操作在更早的运行中是否表现不同?
你需要使用历史运行的步骤结果和截图来回答这些问题。不要只依赖
momentic_get_run
momentic_list_runs
返回的摘要信息来判断测试运行的情况。你必须查看具体的运行详情,包括步骤结果和截图,才能确定历史运行的实际表现。
查看历史运行时,请遵循以下工作流:
  1. 调用
    momentic_list_runs
    工具,确定你需要获取更多详情的运行记录
  2. 针对特定运行调用
    momentic_get_run
    ,获取运行详情。

Multi-attempt runs

多次尝试的运行

When
momentic_list_runs
shows a passing run with
attempts > 1
, treat it as a partial failure worth investigating, not a clean passing run. Pull the first attempt's step results and failure messages to understand what was going wrong before the retry succeeded.
如果
momentic_list_runs
显示某次运行通过,但
attempts > 1
,请将其视为值得排查的部分失败,而不是完全正常的通过。你需要拉取第一次尝试的步骤结果和失败信息,了解重试成功之前出现的问题。

Flakiness and intermittent failures

不稳定性和偶发失败

  • In order to consider a test flaky or failing intermittently, it must be intermittently failing for the same app and test behavior.
    • Just because a test failed once does NOT mean that it's flaky - it could have failed because of an application change. You need to determine whether or not there was an application or test change between runs by analyzing the screenshots and/or browser state in the results.
    • IMPORTANT: You cannot make assumptions about flakiness or intermittent failures without verifying whether there was an application or test change that caused the failure
  • 只有当测试在应用和测试行为完全一致的情况下出现偶发失败,才能将其判定为不稳定或偶发失败。
    • 测试仅失败一次并不代表它不稳定——失败可能是因为应用发生了变更。你需要通过分析结果中的截图和/或浏览器状态,判断不同运行之间是否存在应用或测试的变更。
    • 重要提示:在没有验证失败是否由应用或测试变更导致的情况下,你不能直接判定测试存在不稳定性或偶发失败问题。

Test temporality

测试时效性

  • Any past results may not necessarily match today’s test file. The test may have changed, meaning the result was on a different version of the test.
  • Looking at the stepsSnapshot property of the attempt metadata.json can help you determine whether the test has changed.
  • 任何历史运行结果都不一定和当前的测试文件匹配。测试可能已经发生变更,意味着历史结果对应的是另一个版本的测试。
  • 查看尝试对应的metadata.json的stepsSnapshot属性可以帮助你判断测试是否发生了变更。

Identifying related vs unrelated issues

区分相关问题与无关问题

  • Use test name and description to determine what the test is intending to verify
  • Failures outside that intent are unrelated, otherwise consider them related.
  • Any failures in setup or teardown steps are pretty much always considered unrelated
  • 利用测试名称和描述判断测试的校验目标
  • 超出校验目标的失败属于无关问题,否则视为相关问题
  • 任何测试准备或收尾步骤的失败几乎都属于无关问题

Bug vs change

缺陷与变更的区分

  • Bug: something very clearly went wrong when it shouldn't have, such as an error message appearing. It's obvious just by looking at a single step or two that this is a bug.
  • Change: any other behavior changes in the application
  • 缺陷:明显不应出现的问题,比如出现错误提示。仅查看一两个步骤就能明确判断是缺陷。
  • 变更:应用出现的其他任何行为变化。

Formal classification output

正式分类输出

  • Exactly one category id — no new labels, no multi-label.
  • Ground your decision in data. Be sure that you've fully investigated the run before assigning the category.
text
Reasoning: <a few sentences tied to summary, past runs, and intent>
Category: <one id from the list>
  • 必须且仅能指定一个类别ID——不要新增标签,也不要打多个标签。
  • 你的判断必须基于数据。在指定类别前,请确保你已经对运行情况做了全面排查。
text
Reasoning: <a few sentences tied to summary, past runs, and intent>
Category: <one id from the list>

Category ids

类别ID

Use these strings verbatim:
  • NO_FAILURE
    — Nothing failed; all attempts passed.
  • RELATED_APPLICATION_CHANGE
    — Related to intent; expectation drift / change, not a clear defect.
  • RELATED_APPLICATION_BUG
    — Related to intent; clearly incorrect behavior.
  • UNRELATED_APPLICATION_CHANGE
    — Outside intent; not a clear bug.
  • UNRELATED_APPLICATION_BUG
    — Outside intent but clearly broken.
  • TEST_CAN_BE_IMPROVED
    — Test/automation issue (race, vague locator or assertion).
  • INFRA
    — Rare or external (browser crash, resource pressure, rate limits, flaky environment).
  • PERFORMANCE
    — Load/responsiveness (stuck spinner, assertion timeouts) when not pure infra.
  • MOMENTIC_ISSUE
    — There was an issue with momentic itself, the platform running the test (e.g. an AI hallucination, data issues, incorrectly redirecting to the wrong element).
请严格使用以下字符串:
  • NO_FAILURE
    — 没有失败;所有尝试都运行通过。
  • RELATED_APPLICATION_CHANGE
    — 与测试目标相关;预期值漂移/变更,不属于明确缺陷。
  • RELATED_APPLICATION_BUG
    — 与测试目标相关;明确的异常行为。
  • UNRELATED_APPLICATION_CHANGE
    — 超出测试目标;不属于明确缺陷。
  • UNRELATED_APPLICATION_BUG
    — 超出测试目标,但存在明确故障。
  • TEST_CAN_BE_IMPROVED
    — 测试/自动化问题(竞态条件、模糊的定位器或断言)。
  • INFRA
    — 罕见或外部问题(浏览器崩溃、资源压力、限流、环境不稳定)。
  • PERFORMANCE
    — 负载/响应性问题(加载动画卡住、断言超时),不属于纯基础设施问题。
  • MOMENTIC_ISSUE
    — Momentic本身或运行测试的平台存在问题(例如AI幻觉、数据问题、错误定位到错误元素)。