flows-design-review

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Flows Design Review

Flows 设计评审

This is step 3 of the Flows app certification flow:

flows-app-brief  →  build  →  flows-code-review  →  flows-design-review (this skill)  →  flows-external-app-submit

This is the manual design quality assessment described in docs.cognite.com/cdf/flows/guides/quality-guidelines. Target overall average: 3.8 or higher to be launch-ready.

这是Flows应用认证流程的第3步：

flows-app-brief  →  build  →  flows-code-review  →  flows-design-review (本技能)  →  flows-external-app-submit

这是docs.cognite.com/cdf/flows/guides/quality-guidelines文档中描述的手动设计质量评估。目标整体平均分：3.8分及以上即可达到上线标准。

Operating rules

操作规则

Automate first, ask second. For every question Q1–Q10, run the probes listed below to gather hard evidence from the repo and propose a draft score (1–5) with rationale before asking the user. The user's job is to confirm or override the proposed score, not to grade from scratch. This dramatically reduces the manual burden.
The task walkthrough (Step 2) is the one part that cannot be skipped — automation cannot tell whether a user "gets lost" navigating a screen. Capture it manually and use it to override the auto-derived scores where lived experience disagrees.
Use
```
AskQuestion
```
for every score so answers are structured. For each question present three options: (a) accept the draft score, (b) override with a specific score, (c) override + add a note.
Pre-fill user, tasks, and persona context from
```
App-Brief.md
```
frontmatter when present.

先自动化，再询问用户。对于Q1至Q10的每一个问题，先运行下方列出的探测工具从代码库中收集确凿证据，给出初步评分（1-5分）及理由，之后再询问用户。用户的任务是确认或修改初步评分，而非从零开始打分。这能大幅降低手动工作量。
**任务演练（步骤2）**是唯一无法跳过的环节——自动化无法判断用户在浏览界面时是否“迷路”。需手动记录该环节的结果，当实际体验与自动生成的评分不符时，以此修改评分。
对每一项评分使用
```
AskQuestion
```
，确保答案结构化。针对每个问题提供三个选项：(a) 接受初步评分，(b) 修改为指定分数，(c) 修改分数并添加备注。
若存在
```
App-Brief.md
```
，从其前置信息中预填充用户、任务和角色相关上下文。

Step 0 — Pre-scan before prompting

步骤0 — 询问用户前先预扫描

Always pre-scan before asking the user anything. Read these sources silently and surface what you found as evidence — never as scores, never auto-saved:

Source	Use it for
`App-Brief.md` frontmatter	Pre-fill primary user ( `userRole` ), tasks ( `oneSentenceStory` ), success criteria
`package.json`	Confirm `@cognite/aura` is installed and surface its version (informs Q1)
Latest `reviews/code-review/feedback-round-<N>/code-review-report.md`	Pull design-adjacent findings (accessibility, error handling, UX copy) and present them as evidence under Q4/Q10
`src/*/.{ts,tsx,css}`	Q1 probe — grep for hard-coded hex/rgb colors and raw `px` / `rem` values outside Aura tokens
`src/*/.{ts,tsx}`	Q5 probe — `onClick` on non-button elements without `role` / `tabIndex`
`src/*/.{ts,tsx}`	Q10 probe — icon buttons missing `aria-label` , `<img>` without `alt` , missing focus styles

Show the user the pre-scan results in your opening message before any scoring. They are starting points, not verdicts. The manual task walkthrough (Step 2) and user-assigned scores remain authoritative.

在向用户提问前务必先进行预扫描。静默读取以下来源的信息，并将发现的内容作为证据展示——切勿直接作为评分，也不要自动保存：

来源	用途
`App-Brief.md` 前置信息	预填充核心用户（ `userRole` ）、任务（ `oneSentenceStory` ）、成功标准
`package.json`	确认 `@cognite/aura` 已安装并显示其版本（用于Q1评估）
最新的 `reviews/code-review/feedback-round-<N>/code-review-report.md`	提取与设计相关的发现（可访问性、错误处理、UX文案），作为Q4/Q10的证据
`src/*/.{ts,tsx,css}`	Q1探测——搜索硬编码的十六进制/ RGB颜色值，以及Aura令牌之外的原始 `px` / `rem` 值
`src/*/.{ts,tsx}`	Q5探测——检查非按钮元素上的 `onClick` 事件是否缺少 `role` / `tabIndex` 属性
`src/*/.{ts,tsx}`	Q10探测——检查图标按钮是否缺失 `aria-label` 、 `<img>` 标签是否缺少 `alt` 属性、是否缺失焦点样式

在开始评分前，先向用户展示预扫描结果。这些只是起点，而非最终结论。手动任务演练（步骤2）和用户指定的评分仍具有权威性。

Step 0b — Choose feedback round

步骤0b — 选择反馈轮次

Look at

reviews/design-review/

. If it doesn't exist, this is round 1. Otherwise increment to the next missing

feedback-round-<N>/

directory.

查看

reviews/design-review/

目录。若目录不存在，则为第1轮；否则递增到下一个缺失的

feedback-round-<N>/

目录。

Step 1 — Confirm user and tasks

步骤1 — 确认用户与任务

Per the docs, "the quality assessment is only as useful as the clarity of the user and tasks it's based on."

App-Brief.md

exists, parse

userRole

oneSentenceStory

, and

successCriteria

from its frontmatter and propose them as the primary user and tasks. Ask the user to confirm or extend.

Capture, via

AskQuestion

Primary user — specific role and context (e.g. "Maintenance engineers on offshore platforms").
2–3 critical tasks — the workflows this user needs to complete (e.g. "Check pump vibration alerts", "Schedule maintenance work").
Context — experience level, time constraints, device, success criteria.

根据文档，“质量评估的有效性取决于用户和任务定义的清晰度”。

若存在

App-Brief.md

，从其前置信息中解析

userRole

、

oneSentenceStory

和

successCriteria

，并将其作为核心用户和任务的建议内容。请用户确认或补充。

通过

AskQuestion

收集以下信息：

核心用户——具体角色和场景（例如：“海上平台的维护工程师”）。
2-3项关键任务——用户需要完成的工作流程（例如：“检查泵振动警报”、“安排维护工作”）。
上下文——用户经验水平、时间限制、使用设备、成功标准。

Step 2 — Walk each task end-to-end (manual)

步骤2 — 端到端演练每项任务（手动）

Instruct the user to:

Open the app as that user in a clean browser session with representative test data.
Complete each task from beginning to end without shortcuts.
Note pain points: where they get stuck, confused, or make errors.

For each task, prompt the user to paste back: what happened, where they got stuck, and any screenshots / notes. Capture these as

taskWalkthroughs[]

for the report.

Do NOT proceed to scoring until the user confirms they walked every task. If they refuse, write a stub report that records "task walkthrough skipped" and exits — do not score.

指导用户：

在干净的浏览器会话中，使用代表性测试数据，以目标用户身份打开应用。
从头至尾完成每项任务，不使用快捷方式。
记录痛点：在哪里卡住、感到困惑或出现错误。

针对每项任务，提示用户返回：发生了什么、在哪里卡住，以及任何截图/备注。将这些内容作为

taskWalkthroughs[]

记录到报告中。

在用户确认完成所有任务演练前，不要进入评分环节。若用户拒绝，生成一份存根报告，记录“任务演练已跳过”并退出——不要进行评分。

Step 3 — Score the 10 questions (probe → propose → confirm)

步骤3 — 为10个问题评分（探测→初步评分→确认）

For every question Q1–Q10, follow the same loop:

Run the listed probes. They are concrete shell / grep / lint / build commands that produce hard evidence from the repo.
Propose a draft score (1–5) based on the probe results and the rubric. Show your work: which probe results led to which score.
Cross-check against the user's task-walkthrough notes from Step 2 (especially for navigation, clickability, error prevention).
Ask the user via
AskQuestion
with three options: (a) accept the proposed score
N
, (b) override with a specific score, (c) override + add a note.
Capture the final score, a one-line rationale, and an improvement note.

对于Q1至Q10的每一个问题，遵循相同流程：

运行列出的探测工具。这些是具体的shell/grep/lint/build命令，可从代码库中生成确凿证据。
根据探测结果和评分标准给出初步评分（1-5分）。展示推导过程：哪些探测结果得出了该评分。
与步骤2中用户的任务演练记录交叉核对（尤其针对导航、可点击性、错误预防方面）。
通过
AskQuestion
询问用户，提供三个选项：(a) 接受初步评分
```
N
```
，(b) 修改为指定分数，(c) 修改分数并添加备注。
记录最终评分、一行理由以及改进建议。

Heuristics for translating probe results into a draft score

将探测结果转换为初步评分的启发式规则

These thresholds are starting points — adjust based on the specific evidence and the rubric language. The user always has the final say.

Signal	Drift toward
0 anti-pattern matches, lint clean for the relevant rule	5
≤ 3 small matches, mostly in one file	4
5–15 matches across several files, or 1 systemic issue	3
15+ matches, or pervasive anti-pattern	2
Anti-pattern is the default style	1

这些阈值是起点——需根据具体证据和评分标准语言进行调整。用户拥有最终决定权。

信号	倾向评分
0个反模式匹配，相关规则的lint检查无问题	5
≤3个小问题，大多集中在一个文件中	4
5-15个问题分布在多个文件中，或存在1个系统性问题	3
15+个问题，或普遍存在反模式	2
反模式为默认样式	1

Per-question automated probes

按问题划分的自动化探测工具

Each question's probe list is the first thing the agent should run before asking the user anything about that question. Always state which probes were run and what they returned.

每个问题的探测工具列表是代理在询问用户关于该问题的任何内容前应首先运行的内容。务必说明运行了哪些探测工具以及返回结果。

The 10 questions and rubric

10个问题及评分标准

Q1 — Aura design system consistency. Are you using Aura tokens, layouts, components and patterns correctly?

Probes (automatable):

```
grep -c '@cognite/aura' package.json
```
— confirm Aura is a dependency

grep -rlE "from '@cognite/aura'" --include='*.ts' --include='*.tsx' src | wc -l

— count files importing Aura

grep -rlE '#[0-9a-fA-F]{3,8}' --include='*.css' --include='*.tsx' --include='*.ts' src

— files with hard-coded hex colors

grep -rlE '\b(rgb|rgba|hsl|hsla)\(' --include='*.tsx' --include='*.css' src

— files with raw rgb/hsl values

npx eslint . --ext .ts,.tsx --rule '{"aura/no-overriding-styles":"error"}' --no-eslintrc --quiet 2>&1 | tail -5

or read the existing lint output for

aura/no-overriding-styles

warning counts

Translate to draft score: 0 hard-coded colors + 0

aura/no-overriding-styles

warnings → 5. Few warnings (1–5) → 4. Many warnings (>15) or no Aura imports → 2–3.

5 Excellent: All Aura tokens applied correctly, no hard-coded values. Proper responsive sizing and page layouts. Aura components used without style overrides. Best practices followed.
4 Good: Mostly Aura tokens and components with 1–2 minor exceptions. Layout spacing mostly consistent. Minimal style overrides.
3 Average: Mix of Aura and custom elements. Some proper spacing, some random values. Overriding styles in multiple places.
2 Below average: Frequently custom colors, typography, or spacing instead of Aura tokens. Heavy customization that breaks patterns.
1 Poor: Not using Aura at all. Custom colors, fonts, spacing throughout.

Q2 — Navigation, layout and hierarchy. Can users tell where they are and navigate easily?

Probes (partially automatable — relies on Step 2 walkthrough):

```
grep -rcE '<Route\b' --include='*.tsx' src
```
— count routes (informs navigation surface)
```
grep -rlE 'Breadcrumb' --include='*.tsx' src
```
— files using breadcrumb components (location cues)

grep -rlE 'NavLink|Link to=|useLocation' --include='*.tsx' src

— navigation primitives in use

grep -rlE '<Topbar|<Sidebar|<Header' --include='*.tsx' src

— top-level chrome

Look at the route tree (
```
src/routes/
```
) and ask: does each non-trivial page show its own title and a way back?

Translate to draft score: Default to the walkthrough finding since navigation feel is hard to measure statically. Use probes to flag risks (e.g. routes without breadcrumbs).

5: Current location always clear. Easy navigation forward/back. Consistent menus. Strong visual hierarchy. Content flows logically (F/Z pattern).
4: Usually clear. Navigation mostly consistent. Minor exceptions.
3: Sometimes unclear. Navigation works but not always intuitive. Hierarchy exists but not always clear.
2: Often lost or confused. Navigation changes between pages. Weak hierarchy.
1: No indication of current location. No clear navigation. Inconsistent structure.

Q3 — Clear labels and language. Are buttons, inputs, and actions labeled clearly?

Probes (automatable):

grep -rcE ">(Submit|OK|Click here|Go|Yes|No)<" --include='*.tsx' src

— count vague button labels

```
grep -rcE '<Button[^>]*>[[:space:]]*</Button>' --include='*.tsx' src
```
— empty buttons (icon-only without label needs aria-label, handled in Q10)

grep -rlE '<Label\b' --include='*.tsx' src

and

grep -rlE '<input\b' --include='*.tsx' src

— input elements vs labels; mismatch suggests unlabeled inputs

```
grep -rcE 'placeholder=' --include='*.tsx' src
```
— placeholder-as-label is an anti-pattern; high count without matching
```
<Label>
```
is a smell

Translate to draft score: 0 vague labels + every input has a matching label → 5. Few placeholder-only inputs → 4. Vague labels in several places → 3.

5: Every element has a clear, specific label. Plain, action-oriented language ("Save changes", "Delete item").
4: Most labels clear. Minor ambiguity.
3: Labels present but sometimes vague ("Submit", "OK"). Some unnecessary jargon.
2: Many labels unclear. Heavy technical terms without explanation.
1: Labels missing, confusing, or jargon-laden.

Q4 — System feedback and validation. Do users know what's happening? Are forms easy to use?

Probes (automatable):

grep -rlE 'isLoading|isPending|<Skeleton|<Loader|<Spinner' --include='*.tsx' src

— files with loading affordances

grep -rlE 'isError|onError|<Alert|toast\.' --include='*.tsx' src

— files with error/success affordances

grep -rlE 'useMutation' --include='*.tsx' src

— mutation sites; cross-check that each has

onSuccess

onError

handlers

```
grep -rlE 'ErrorBoundary' --include='*.tsx' src
```
— error boundaries (also cross-checked in code review)
For each route/feature folder, ratio of (loading + error files) ÷ (data-fetching files) should be ≈ 1

Translate to draft score: Loading and error states present on every fetch/mutation → 5. A few mutations without explicit error handling → 4. Mixed coverage → 3.

5: Immediate feedback. Clear loading states. Helpful success/error messages. All fields labeled, required fields marked, real-time validation with specific messages.
4: Most actions provide feedback. Loading states present. Validation mostly helpful.
3: Some feedback but inconsistent. Loading states sometimes missing. Generic error messages.
2: Minimal feedback. Users often don't know if actions worked. Validation only on submit.
1: No feedback. Silent failures. Technical error codes.

Q5 — Clickability and interactions. Is it obvious what's clickable?

Probes (automatable):

grep -rcE '<div[^>]*onClick' --include='*.tsx' src

—

onClick

<div>

(non-semantic, often missing keyboard support)

grep -rcE '<span[^>]*onClick' --include='*.tsx' src

— same for

<span>

grep -rcE 'role="button"' --include='*.tsx' src

— explicit role assignments (good if

<div onClick>

is unavoidable)

```
grep -rcE 'hover:|focus:' --include='*.tsx' src
```
— Tailwind hover/focus utility usage (high = good)

grep -rcE 'cursor-pointer' --include='*.tsx' src

— explicit pointer cursor

Translate to draft score: 0

<div onClick>

without role + many hover/focus utilities → 5. 1–3 violations → 4. Many

onClick

on non-button elements → 2–3.

5: All clickable items look clickable. Hover effects on interactive elements. Cursor changes appropriately.
4: Most interactive elements obvious. Hover effects mostly present.
3: Inconsistent hover states. Occasionally unclear what's interactive.
2: Many interactive elements don't look clickable. Few hover effects.
1: Can't tell what's clickable. No visual feedback.

Q6 — Error prevention and recovery. Can users undo or cancel destructive actions?

Probes (partially automatable):

grep -rilE 'delete|remove|archive|reset' --include='*.tsx' src | head -20

— files with potentially destructive actions

grep -rlE 'AlertDialog|ConfirmDialog|window\.confirm' --include='*.tsx' src

— confirm-dialog usage

grep -rcE 'variant="destructive"|destructive' --include='*.tsx' src

— destructive button styling

For each file with destructive verbs, check there is a corresponding
```
AlertDialog
```
/
```
ConfirmDialog
```
invocation in the same file or its imports

N/A guidance: Read-only viewer apps (the common case for Flows demos) have no destructive actions and should score 5 by default with a "no destructive actions" rationale. Do not penalize an app for not having confirmations it does not need.

5: Confirmation dialogs before destructive actions. Auto-save prevents data loss. Clear undo or cancel options. OR the app has no destructive actions.
4: Most destructive actions have warnings. Some auto-save or undo.
3: Some warnings for major actions. Limited undo/cancel.
2: Few warnings. No undo. Easy to lose work.
1: No warnings. No undo. Frequent accidental data loss.

Q7 — Responsive design and multi-device support. Does it work on different screen sizes?

Probes (automatable):

grep -rcE '\b(sm|md|lg|xl|2xl):' --include='*.tsx' src

— Tailwind responsive utility usage (high = good)

grep -E '<meta name="viewport"' index.html

— viewport meta tag present

grep -rcE 'overflow-x-auto|overflow-x-scroll' --include='*.tsx' src

— horizontal scroll containers (often a smell)

grep -rcE '\bw-\[[0-9]+px\]|\bh-\[[0-9]+px\]' --include='*.tsx' src

— fixed-px sizing (usually breaks small screens)

Read
```
App-Brief.md
```
```
userRole
```
— if it says "desktop or laptop in control room" the app may be intentionally desktop-only; this is acceptable per the rubric ("Hidden or limited on mobile if not intended for mobile")

Translate to draft score: If app is desktop-only by design (per App-Brief) and renders cleanly on laptop down to 13" → 5. Mixed responsive utility usage → 4. Many fixed-px sizes → 3.

5: Seamless across desktop, tablet, mobile. Touch targets 40px+. Text readable. No horizontal scrolling. Hover states accounted for on touch. OR intentionally desktop-only per the brief and clean on supported sizes.
4: Works well on most devices. Minor issues.
3: Functional on multiple devices but not optimized. Some layout issues on smaller screens.
2: Poor mobile/tablet experience. Layouts break.
1: Desktop only. Broken on mobile/tablet.

Q8 — Empty states and first-time experience. When there's no data, is it clear what to do next?

Probes (automatable):

grep -rilE 'empty|no\s+(data|results|items|files|matches)' --include='*.tsx' src

— files with empty-state copy

grep -rlE '<EmptyState|EmptyPlaceholder' --include='*.tsx' src

— explicit empty-state components

For each panel/list module (anything with
```
.list(
```
or
```
.items.map(
```
), check there is at least one branch handling
```
items.length === 0
```
with user-visible copy. List the panels that DO and DO NOT.

grep -rcE 'items\.length === 0|items\.length > 0' --include='*.tsx' src

— explicit empty checks

Translate to draft score: Every data-fetching panel has an empty-state branch with copy → 5. One or two missing → 4. Many panels missing → 2–3.

5: All empty states show helpful messages and clear next steps. First-time users know exactly what to do.
4: Most empty states helpful. Minor gaps.
3: Some empty states explained. First-time users can figure it out.
2: Many blank pages with no guidance.
1: Blank pages everywhere. No guidance.

Q9 — Performance and efficiency. Does the app load quickly?

Probes (automatable):

First, check whether a recent build already exists — avoids a slow rebuild when

dist/

is fresh:

bash

find dist -maxdepth 1 -newer package.json -name '*.js' 2>/dev/null | wc -l
du -sh dist/ 2>/dev/null

If the count is 0 (no recent build), fall back to:

bash

npm run build 2>&1 | tail -20

Then gather the remaining metrics:

grep -rcE 'React\.lazy|lazy\(' --include='*.tsx' src

— code-split routes (good)

grep -rcE 'useMemo|useCallback' --include='*.tsx' src

— memoization usage (informs render efficiency)

grep -rlE 'useVirtual|react-window|react-virtual' --include='*.tsx' src

— list virtualization (good for big lists)

grep -rlE '\.list\([^)]*\)' --include='*.ts' --include='*.tsx' src | xargs -I{} grep -l 'limit:' {} 2>/dev/null | wc -l

vs total list call sites — pagination coverage

Cross-reference the latest
```
code-review-report.md
```
criterion 2.3 (Limits & pages) score

Translate to draft score: Build under 1 MB gzipped + every list has a limit + react-query in use → 5. Bundle 1–2 MB or some lists missing limits → 4. Bundle > 2 MB or systemic unbounded fetches → 2–3.

5: Fast loading with progressive content. Bulk actions, keyboard shortcuts. Common tasks take minimal clicks.
4: Reasonable loading. Most tasks streamlined.
3: Acceptable performance. Tasks moderate effort. Few shortcuts.
2: Slow loading. Tasks require many steps.
1: Very slow or unresponsive.

Q10 — Accessibility (WCAG AA 2.1). Can people use it with assistive tech?

Probes (automatable):

Count
```
<img>
```
tags and
```
<img>
```
tags with
```
alt
```
attributes separately to identify missing alt text:
bash
```
grep -rcE '<img\b' --include='*.tsx' src
grep -rcE '<img[^>]*\balt=' --include='*.tsx' src
```
Any difference means images are missing
```
alt
```
.

grep -rcE '<button[^>]*>[[:space:]]*<(svg|Icon)' --include='*.tsx' src

— icon-only buttons (need

aria-label

)

grep -rcE 'aria-label=' --include='*.tsx' src

— ARIA label usage

grep -rcE 'focus-visible:|focus:' --include='*.tsx' src

— focus styles

```
grep -rcE 'tabIndex=\{-1\}|tabIndex="?-1' --include='*.tsx' src
```
— elements removed from tab order (sometimes intentional, sometimes a bug)

eslint-plugin-jsx-a11y

is installed:

npx eslint . --ext .ts,.tsx --no-eslintrc --rule '{"jsx-a11y/alt-text":"error","jsx-a11y/anchor-is-valid":"error","jsx-a11y/click-events-have-key-events":"error"}' 2>&1 | tail -10

If
```
axe-core
```
is available: suggest the user run an axe scan in the running app and paste results — automation can flag candidates, not enforce contrast

Translate to draft score: 0 missing alts + 0 icon-only buttons without aria-label + focus styles everywhere → 5. A few violations → 4. Systemic gaps → 2–3.

5: All interactions via keyboard. Text contrast meets WCAG AA. Clear focus indicators. Proper ARIA labels. Alt text on images. Touch targets 40px+ / mouse targets 20px+. Form errors announced to screen readers.
4: Most requirements met. Minor exceptions.
3: Basic keyboard support but missing for some features. Mostly acceptable contrast. Focus indicators present but not always clear.
2: Limited keyboard support. Multiple contrast failures. Weak focus indicators.
1: No keyboard navigation. Poor contrast. No focus indicators. Not usable with assistive tech.

Q1 — Aura设计系统一致性。是否正确使用Aura令牌、布局、组件和模式？

探测工具（可自动化）：

```
grep -c '@cognite/aura' package.json
```
— 确认Aura是依赖项

grep -rlE "from '@cognite/aura'" --include='*.ts' --include='*.tsx' src | wc -l

— 统计导入Aura的文件数量

grep -rlE '#[0-9a-fA-F]{3,8}' --include='*.css' --include='*.tsx' --include='*.ts' src

— 包含硬编码十六进制颜色的文件

grep -rlE '\b(rgb|rgba|hsl|hsla)\(' --include='*.tsx' --include='*.css' src

— 包含原始rgb/hsl值的文件

npx eslint . --ext .ts,.tsx --rule '{"aura/no-overriding-styles":"error"}' --no-eslintrc --quiet 2>&1 | tail -5

或读取现有lint输出中

aura/no-overriding-styles

的警告数量

转换为初步评分：0个硬编码颜色 + 0个

aura/no-overriding-styles

警告 → 5分。少量警告（1-5个）→4分。大量警告（>15个）或无Aura导入→2-3分。

5分（优秀）：所有Aura令牌应用正确，无硬编码值。响应式尺寸和页面布局合理。Aura组件未被样式覆盖。遵循最佳实践。
4分（良好）：主要使用Aura令牌和组件，存在1-2个小例外。布局间距基本一致。样式覆盖极少。
3分（一般）：混合使用Aura和自定义元素。部分间距合理，部分为随机值。多处存在样式覆盖。
2分（较差）：频繁使用自定义颜色、排版或间距，而非Aura令牌。大量自定义修改破坏了设计模式。
1分（糟糕）：完全未使用Aura。全程使用自定义颜色、字体、间距。

Q2 — 导航、布局与层级。用户能否清楚了解自己所处位置并轻松导航？

探测工具（部分可自动化——依赖步骤2的任务演练）：

```
grep -rcE '<Route\b' --include='*.tsx' src
```
— 统计路由数量（用于评估导航界面）
```
grep -rlE 'Breadcrumb' --include='*.tsx' src
```
— 使用面包屑组件的文件（位置提示）

grep -rlE 'NavLink|Link to=|useLocation' --include='*.tsx' src

— 使用的导航原语

grep -rlE '<Topbar|<Sidebar|<Header' --include='*.tsx' src

— 顶级界面框架

查看路由树（
```
src/routes/
```
）并询问：每个非简单页面是否显示自身标题及返回方式？

转换为初步评分：默认以任务演练结果为准，因为导航体验难以通过静态方式衡量。使用探测工具标记风险（例如：无面包屑的路由）。

5分：当前位置始终清晰。前进/后退导航便捷。菜单一致。视觉层级清晰。内容逻辑流畅（F/Z模式）。
4分：通常位置清晰。导航基本一致。存在少量例外。
3分：有时位置不清晰。导航可用但并非始终直观。层级存在但并非始终清晰。
2分：经常迷路或困惑。页面间导航方式不一致。层级薄弱。
1分：无当前位置提示。无清晰导航。结构不一致。

Q3 — 清晰的标签与语言。按钮、输入框和操作的标签是否清晰？

探测工具（可自动化）：

grep -rcE ">(Submit|OK|Click here|Go|Yes|No)<" --include='*.tsx' src

— 统计模糊按钮标签的数量

```
grep -rcE '<Button[^>]*>[[:space:]]*</Button>' --include='*.tsx' src
```
— 空按钮（仅图标无标签需添加aria-label，在Q10中处理）

grep -rlE '<Label\b' --include='*.tsx' src

和

grep -rlE '<input\b' --include='*.tsx' src

— 输入元素与标签；不匹配表示存在未标记的输入框

```
grep -rcE 'placeholder=' --include='*.tsx' src
```
— 占位符作为标签是反模式；大量使用占位符但无对应
```
<Label>
```
是不良信号

转换为初步评分：0个模糊标签 + 每个输入框都有对应标签 →5分。少量仅使用占位符的输入框→4分。多处存在模糊标签→3分。

5分：每个元素都有清晰、具体的标签。使用简洁、面向操作的语言（例如：“保存更改”、“删除项目”）。
4分：大多数标签清晰。存在少量歧义。
3分：标签存在但有时模糊（例如：“Submit”、“OK”）。存在不必要的行话。
2分：大量标签不清晰。频繁使用技术术语且无解释。
1分：标签缺失、混乱或充满行话。

Q4 — 系统反馈与验证。用户能否了解当前状态？表单是否易用？

探测工具（可自动化）：

grep -rlE 'isLoading|isPending|<Skeleton|<Loader|<Spinner' --include='*.tsx' src

— 包含加载提示的文件

grep -rlE 'isError|onError|<Alert|toast\.' --include='*.tsx' src

— 包含错误/成功提示的文件

```
grep -rlE 'useMutation' --include='*.tsx' src
```
— 变更操作站点；交叉检查每个站点是否有
```
onSuccess
```
/
```
onError
```
处理函数
```
grep -rlE 'ErrorBoundary' --include='*.tsx' src
```
— 错误边界（也会在代码评审中交叉检查）
对于每个路由/功能文件夹，（加载+错误文件）÷（数据获取文件）的比例应≈1

转换为初步评分：每个数据获取/变更操作都有加载和错误状态→5分。部分变更操作无明确错误处理→4分。覆盖情况参差不齐→3分。

5分：即时反馈。清晰的加载状态。有用的成功/错误消息。所有字段都有标签，必填字段标记明确，实时验证并提供具体消息。
4分：大多数操作提供反馈。存在加载状态。验证大多有用。
3分：存在部分反馈但不一致。有时缺失加载状态。错误消息通用。
2分：反馈极少。用户常常不知道操作是否成功。仅在提交时进行验证。
1分：无反馈。静默失败。返回技术错误代码。

Q5 — 可点击性与交互。用户能否清楚识别可点击元素？

探测工具（可自动化）：

```
grep -rcE '<div[^>]*onClick' --include='*.tsx' src
```
—
```
<div>
```
元素上的
```
onClick
```
事件（非语义化，通常缺少键盘支持）

grep -rcE '<span[^>]*onClick' --include='*.tsx' src

—

<span>

元素上的同类情况

```
grep -rcE 'role="button"' --include='*.tsx' src
```
— 显式角色分配（若
```
<div onClick>
```
无法避免则为良好实践）
```
grep -rcE 'hover:|focus:' --include='*.tsx' src
```
— Tailwind悬停/焦点工具类的使用（使用越多越好）

grep -rcE 'cursor-pointer' --include='*.tsx' src

— 显式指针光标

转换为初步评分：0个无role属性的

<div onClick>

+ 大量悬停/焦点工具类→5分。1-3个违规→4分。大量非按钮元素使用

onClick

→2-3分。

5分：所有可点击元素看起来都可点击。交互元素有悬停效果。光标正确变化。
4分：大多数交互元素清晰可辨。悬停效果基本存在。
3分：悬停状态不一致。偶尔无法明确哪些元素可交互。
2分：大量交互元素看起来不可点击。悬停效果极少。
1分：无法判断哪些元素可点击。无视觉反馈。

Q6 — 错误预防与恢复。用户能否撤销或取消破坏性操作？

探测工具（部分可自动化）：

grep -rilE 'delete|remove|archive|reset' --include='*.tsx' src | head -20

— 包含潜在破坏性操作的文件

grep -rlE 'AlertDialog|ConfirmDialog|window\.confirm' --include='*.tsx' src

— 确认对话框的使用情况

grep -rcE 'variant="destructive"|destructive' --include='*.tsx' src

— 破坏性按钮的样式

对于每个包含破坏性动词的文件，检查同一文件或其导入的内容中是否有对应的
```
AlertDialog
```
/
```
ConfirmDialog
```
调用

不适用说明：只读查看器应用（Flows演示的常见情况）无破坏性操作，默认得5分，理由为“无破坏性操作”。不要因应用不需要确认对话框而扣分。

5分：破坏性操作前有确认对话框。自动保存防止数据丢失。清晰的撤销或取消选项。或应用无破坏性操作。
4分：大多数破坏性操作有警告。存在部分自动保存或撤销功能。
3分：部分重大操作有警告。撤销/取消功能有限。
2分：警告极少。无撤销功能。容易丢失工作内容。
1分：无警告。无撤销功能。频繁意外丢失数据。

Q7 — 响应式设计与多设备支持。应用能否在不同屏幕尺寸下正常工作？

探测工具（可自动化）：

```
grep -rcE '\b(sm|md|lg|xl|2xl):' --include='*.tsx' src
```
— Tailwind响应式工具类的使用（使用越多越好）

grep -E '<meta name="viewport"' index.html

— 是否存在viewport元标签

grep -rcE 'overflow-x-auto|overflow-x-scroll' --include='*.tsx' src

— 水平滚动容器（通常是不良信号）

grep -rcE '\bw-\[[0-9]+px\]|\bh-\[[0-9]+px\]' --include='*.tsx' src

— 固定像素尺寸（通常在小屏幕上会失效）

读取
```
App-Brief.md
```
中的
```
userRole
```
——若描述为“控制室中的台式机或笔记本电脑”，则应用可能专为桌面设计；根据评分标准，这是可接受的（“若不面向移动设备，则在移动端隐藏或限制功能”）

转换为初步评分：若应用按设计为仅桌面端（根据App-Brief），且在13英寸及以上笔记本电脑上显示正常→5分。混合使用响应式工具类→4分。大量固定像素尺寸→3分。

5分：在桌面、平板、移动端无缝适配。触摸目标≥40px。文本可读。无水平滚动。考虑了触摸设备的悬停状态。或根据需求专为桌面设计，在支持的尺寸下显示正常。
4分：在大多数设备上运行良好。存在少量问题。
3分：可在多设备上运行但未优化。小屏幕上存在部分布局问题。
2分：移动/平板体验不佳。布局失效。
1分：仅支持桌面端。在移动/平板上无法正常显示。

Q8 — 空状态与首次使用体验。无数据时，用户能否清楚了解下一步操作？

探测工具（可自动化）：

grep -rilE 'empty|no\s+(data|results|items|files|matches)' --include='*.tsx' src

— 包含空状态文案的文件

grep -rlE '<EmptyState|EmptyPlaceholder' --include='*.tsx' src

— 显式空状态组件

对于每个面板/列表模块（任何包含
```
.list(
```
或
```
.items.map(
```
的内容），检查是否至少有一个分支处理
```
items.length === 0
```
的情况，并显示用户可见的文案。列出有处理和未处理的面板。

grep -rcE 'items\.length === 0|items\.length > 0' --include='*.tsx' src

— 显式空状态检查

转换为初步评分：每个数据获取面板都有带文案的空状态分支→5分。1-2个缺失→4分。大量面板缺失→2-3分。

5分：所有空状态都显示有用的消息和清晰的下一步操作。首次使用的用户明确知道该做什么。
4分：大多数空状态有用。存在少量遗漏。
3分：部分空状态有说明。首次使用的用户可自行摸索。
2分：大量空白页面无指导信息。
1分：全是空白页面。无任何指导。

Q9 — 性能与效率。应用加载速度快吗？

探测工具（可自动化）：

首先检查是否存在最新构建——当

dist/

目录为最新时可避免缓慢的重新构建：

bash

find dist -maxdepth 1 -newer package.json -name '*.js' 2>/dev/null | wc -l
du -sh dist/ 2>/dev/null

若计数为0（无最新构建），则回退到：

bash

npm run build 2>&1 | tail -20

然后收集剩余指标：

grep -rcE 'React\.lazy|lazy\(' --include='*.tsx' src

— 代码分割路由（良好实践）

```
grep -rcE 'useMemo|useCallback' --include='*.tsx' src
```
— memoization使用情况（用于评估渲染效率）

grep -rlE 'useVirtual|react-window|react-virtual' --include='*.tsx' src

— 列表虚拟化（对大型列表有益）

grep -rlE '\.list\([^)]*\)' --include='*.ts' --include='*.tsx' src | xargs -I{} grep -l 'limit:' {} 2>/dev/null | wc -l

与列表调用站点总数的比值——分页覆盖率

交叉参考最新
```
code-review-report.md
```
中的2.3标准（限制与分页）评分

转换为初步评分：构建产物gzip压缩后小于1MB + 每个列表都有数量限制 + 使用react-query→5分。包大小1-2MB或部分列表无数量限制→4分。包大小>2MB或系统性无限制数据获取→2-3分。

5分：加载速度快，内容渐进式展示。支持批量操作、键盘快捷键。常见任务只需最少点击次数。
4分：加载速度合理。大多数任务流程简化。
3分：性能可接受。任务操作难度中等。快捷键较少。
2分：加载缓慢。任务需要多步操作。
1分：非常缓慢或无响应。

Q10 — 可访问性（WCAG AA 2.1）。使用辅助技术的用户能否正常使用应用？

探测工具（可自动化）：

分别统计
```
<img>
```
标签和带有
```
alt
```
属性的
```
<img>
```
标签数量，找出缺失的替代文本：
bash
```
grep -rcE '<img\b' --include='*.tsx' src
grep -rcE '<img[^>]*\balt=' --include='*.tsx' src
```
数量差异表示存在缺失
```
alt
```
的图片。

grep -rcE '<button[^>]*>[[:space:]]*<(svg|Icon)' --include='*.tsx' src

— 仅图标按钮（需要

aria-label

）

grep -rcE 'aria-label=' --include='*.tsx' src

— ARIA标签使用情况

grep -rcE 'focus-visible:|focus:' --include='*.tsx' src

— 焦点样式

```
grep -rcE 'tabIndex=\{-1\}|tabIndex="?-1' --include='*.tsx' src
```
— 从Tab顺序中移除的元素（有时是故意的，有时是bug）

若安装了

eslint-plugin-jsx-a11y

：

npx eslint . --ext .ts,.tsx --no-eslintrc --rule '{"jsx-a11y/alt-text":"error","jsx-a11y/anchor-is-valid":"error","jsx-a11y/click-events-have-key-events":"error"}' 2>&1 | tail -10

若
```
axe-core
```
可用：建议用户在运行的应用中进行axe扫描并粘贴结果——自动化可标记候选问题，但无法强制对比度要求

转换为初步评分：0个缺失的替代文本 + 0个无aria-label的仅图标按钮 + 所有元素都有焦点样式→5分。少量违规→4分。系统性缺失→2-3分。

5分：所有交互支持键盘操作。文本对比度符合WCAG AA标准。清晰的焦点指示器。正确的ARIA标签。图片有替代文本。触摸目标≥40px / 鼠标目标≥20px。表单错误会向屏幕阅读器播报。
4分：满足大多数要求。存在少量例外。
3分：基本支持键盘操作，但部分功能缺失。对比度大多可接受。焦点指示器存在但并非始终清晰。
2分：键盘支持有限。存在多处对比度不达标。焦点指示器薄弱。
1分：无键盘导航。对比度差。无焦点指示器。无法使用辅助技术访问。

Step 4 — Compute average and quality level

步骤4 — 计算平均分与质量等级

Average = sum of all 10 scores ÷ 10.

Map to the quality level table from the docs:

Average	Quality level	Recommendation
4.5 – 5.0	Excellent — ready to launch	Minor improvements over time
3.8 – 4.4	Good — launch with minor fixes	Address lower-scoring areas
3.0 – 3.7	Average — needs improvement	Fix major problems before launching
Below 3.0	Needs significant work	Substantial improvements required

flows-external-app-submit

gates on average ≥ 3.8.

平均分 = 10项评分总和 ÷ 10。

根据文档中的质量等级表映射：

平均分	质量等级	建议
4.5 – 5.0	优秀——可上线	随时间进行小幅改进
3.8 – 4.4	良好——上线并修复小问题	解决评分较低的领域
3.0 – 3.7	一般——需要改进	上线前修复主要问题
低于3.0	需要大幅改进	需要进行实质性优化

flows-external-app-submit

的准入标准为平均分≥3.8。

Step 5 — Write the report

步骤5 — 撰写报告

Create

reviews/design-review/feedback-round-<N>/design-review-report.md

with this structure:

markdown

undefined

创建

reviews/design-review/feedback-round-<N>/design-review-report.md

，结构如下：

markdown

undefined

Design Review — <appName> — round <N>

设计评审 — <appName> — 第<N>轮

User and tasks

用户与任务

Primary user: ...
Tasks evaluated:
1. ...
2. ...
3. ...
Context: ...

核心用户： ...
评估的任务：
1. ...
2. ...
3. ...
上下文： ...

Task walkthrough findings

任务演练发现

Task 1 — ... ...
Task 2 — ... ...
Task 3 — ... ...

任务1 — ... ...
任务2 — ... ...
任务3 — ... ...

Scores

评分

Question	Score	Rationale	Improvement note
Q1 Aura consistency	n	...	...
Q2 Navigation & hierarchy	n	...	...
Q3 Labels & language	n	...	...
Q4 Feedback & validation	n	...	...
Q5 Clickability	n	...	...
Q6 Error prevention	n	...	...
Q7 Responsive	n	...	...
Q8 Empty states	n	...	...
Q9 Performance	n	...	...
Q10 Accessibility	n	...	...

问题	分数	理由	改进建议
Q1 Aura一致性	n	...	...
Q2 导航与层级	n	...	...
Q3 标签与语言	n	...	...
Q4 反馈与验证	n	...	...
Q5 可点击性	n	...	...
Q6 错误预防	n	...	...
Q7 响应式设计	n	...	...
Q8 空状态	n	...	...
Q9 性能	n	...	...
Q10 可访问性	n	...	...

Summary

总结

Average score: <X.X>
Quality level: <Excellent | Good | Average | Needs significant work>

平均分：<X.X>
质量等级：<优秀 | 良好 | 一般 | 需要大幅改进>

Must Fix (any score < 3)

必须修复（分数<3）

Should Fix (any score 3 – 3.7)

应该修复（分数3 – 3.7）

Nice to Fix (any score 3.8 – 4.4)

建议修复（分数3.8 – 4.4）


The `Average score:` line must be machine-readable in exactly that format — `flows-external-app-submit` parses it.


`平均分：`行必须为机器可读的精确格式——`flows-external-app-submit`会解析该内容。

Step 6 — Print the gate status

步骤6 — 输出准入状态

After writing, print to the terminal:

The average score
The quality level
Whether the result meets the
```
flows-external-app-submit
```
gate (≥ 3.8)
If below 3.8, instruct the user to fix Must Fix and Should Fix items and re-run this skill in a new feedback round.

撰写完成后，在终端输出：

平均分
质量等级
结果是否满足
```
flows-external-app-submit
```
的准入标准（≥3.8）
若低于3.8，指导用户修复“必须修复”和“应该修复”的问题，并在新的反馈轮次中重新运行本技能。