extract
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesestardust:extract
stardust:extract
Crawl an existing website, parse each page, extract the brand surface,
and produce a stardust-formatted snapshot of the current state under
. The output describes what the site is; later
sub-commands consume it to decide what it should be.
stardust/current/This skill is descriptive: it does not invent direction, it does not
critique, and it does not modify the live site. It writes only under
and updates .
stardust/current/stardust/state.json爬取现有网站,解析每个页面,提取品牌资产信息,并在目录下生成当前状态的stardust格式快照。输出内容描述了网站当前的状态;后续子命令会基于此内容确定网站应有的状态。
stardust/current/本技能是描述性的:它不会设定方向,不会进行评判,也不会修改在线网站。仅会在目录下写入内容,并更新文件。
stardust/current/stardust/state.jsonInputs
输入参数
- — required. The origin to crawl. Examples:
<url>,https://example.com. A path narrows the same-origin crawl to that subtree.https://example.com/shop - — optional. Override the default 5-page cap. The cap is intentionally small — a 5-page sample (home + four IA pillars/templates) is enough for cross-page brand aggregation, system-component detection, and the brand-review HTML to do useful work. Lift the cap with
--cap <N>(the previous default) or higher when a deeper crawl is genuinely needed.--cap 25 - — optional. Lift the cap entirely; extract every discovered page after junk filtering. Equivalent to
--all. Use when the user spontaneously asks for a full crawl.--cap 0 - — optional. Restrict the crawl to specific paths (slugs derived per
--pages <slug,slug,...>). Bypasses the cap.reference/ia-extraction.md - — optional. Re-extract one page that already exists in
--refresh <slug>.state.json - — optional. Equivalent to
--single. Useful for testing.--cap 1 - — optional. Wait strategy per page. Default
--wait <fast|medium|spec|auto>. Seemedium§ Wait modes.reference/playwright-recipe.md - — optional. Disable the default junk-page filter in discovery (see
--no-junk-filter§ Filtering).reference/ia-extraction.md - — optional. Skip the pre-flight consent / cookie banner dismissal (see
--no-consent-dismiss§ Pre-flight: consent dismissal). Use when the redesign scope explicitly includes the consent surface, or when the dismissal's side-effects (script activation that wouldn't otherwise run) need to be avoided. Default behaviour is to dismiss; the contract preserves screenshots, voice aggregation, and per-section style from being polluted by the banner.reference/playwright-recipe.md - — optional. Run in migrate-prep mode: lift the cap, type each page, detect module candidates, capture typed content slots, emit the prep summary. See § Prep mode below. Typically invoked via the
--preporchestrator skill rather than directly.prepare-migration
- — 必填项。要爬取的源地址。示例:
<url>、https://example.com。指定路径可将同源爬取范围限定为该子路径。https://example.com/shop - — 可选项。覆盖默认的5页限制。默认限制设置得较小——5页样本(首页 + 4个信息架构支柱/模板)足以完成跨页面品牌聚合、系统组件检测,且品牌审核HTML可发挥实际作用。当确实需要深度爬取时,可通过
--cap <N>(之前的默认值)或更高数值解除限制。--cap 25 - — 可选项。完全解除页数限制;经过垃圾页面过滤后,提取所有发现的页面。等效于
--all。适用于用户主动要求完整爬取的场景。--cap 0 - — 可选项。将爬取范围限制为特定路径(slug依据
--pages <slug,slug,...>生成)。不受页数限制约束。reference/ia-extraction.md - — 可选项。重新提取已存在于
--refresh <slug>中的单个页面。state.json - — 可选项。等效于
--single。适用于测试场景。--cap 1 - — 可选项。每页的等待策略。默认值为
--wait <fast|medium|spec|auto>。详见medium中的「等待模式」章节。reference/playwright-recipe.md - — 可选项。禁用发现阶段的默认垃圾页面过滤器(详见
--no-junk-filter中的「过滤」章节)。reference/ia-extraction.md - — 可选项。跳过预执行的授权/ cookie横幅关闭操作(详见
--no-consent-dismiss中的「预执行:关闭授权横幅」章节)。当重新设计范围明确包含授权界面,或需要避免关闭操作带来的副作用(如激活原本不会运行的脚本)时使用。默认行为是关闭横幅;此约定可避免截图、语音聚合和各区块样式被横幅污染。reference/playwright-recipe.md - — 可选项。以迁移准备模式运行:解除页数限制,为每个页面分类,检测模块候选,捕获带类型的内容插槽,生成准备阶段摘要。详见下文的「准备模式」章节。通常通过
--prep编排器技能调用,而非直接调用。prepare-migration
Setup
环境设置
Run the master skill's setup procedure first
( § Setup): impeccable dep check, context
loader, state read.
skills/stardust/SKILL.mdAdditional checks for this sub-command:
- Playwright availability. The extraction step needs a real
browser. Detect Playwright in this order: a Playwright MCP server,
then . If neither is available, stop and tell the user how to install Playwright.
npx playwright - Origin collision. If already records
stardust/state.jsonand the newsite.originUrlis a different origin, stop and ask before clobbering. Stardust does not silently mix two sites in one project.<url> - Browser context. Open a fresh for the run. Run the consent dismissal pre-flight per
BrowserContext§ Pre-flight: consent dismissal unlessreference/playwright-recipe.mdis set. Cookies persist across the per-page loop within the same context, so one dismissal covers the whole crawl. Record the resolved method in--no-consent-dismiss._crawl-log.json#consent.method - Bot-management probe. When the first navigation in the run
returns or
ERR_HTTP2_PROTOCOL_ERROR, or hangs through the entire hard-cap on what should be a fast origin, do not retry headless. Switch toERR_QUIC_PROTOCOL_ERRORperheadless: false, channel: 'chrome'§ Bot-management fallback and record the switch inreference/playwright-recipe.mdso re-runs start in headed mode without rediscovering the issue._crawl-log.json#discovery.fetchTechnique
首先运行主技能的设置流程(中的「设置」章节):完成依赖检查、上下文加载、状态读取。
skills/stardust/SKILL.md本子命令的额外检查项:
- Playwright可用性。提取步骤需要真实浏览器。按以下顺序检测Playwright:先检查Playwright MCP服务器,再检查。若两者均不可用,停止操作并告知用户如何安装Playwright。
npx playwright - 源地址冲突。若已记录
stardust/state.json,且新的site.originUrl为不同源地址,停止操作并在覆盖前询问用户。Stardust不会在一个项目中静默混合两个网站的数据。<url> - 浏览器上下文。为本次运行打开全新的。除非设置了
BrowserContext,否则按照--no-consent-dismiss中的「预执行:关闭授权横幅」章节执行授权横幅关闭预操作。Cookie会在同一上下文的每页循环中保留,因此一次关闭操作即可覆盖整个爬取过程。将解决方法记录在reference/playwright-recipe.md中。_crawl-log.json#consent.method - 机器人管理探测。当运行中的首次导航返回或
ERR_HTTP2_PROTOCOL_ERROR,或在本应快速响应的源地址上超时达到硬限制时,不要重试无头模式。按照ERR_QUIC_PROTOCOL_ERROR中的「机器人管理回退方案」切换为reference/playwright-recipe.md,并将切换操作记录在headless: false, channel: 'chrome'中,以便重新运行时直接以有头模式启动,无需再次排查问题。_crawl-log.json#discovery.fetchTechnique
Procedure
执行流程
Phase 1 — Discovery
阶段1 — 页面发现
Discover the page inventory before crawling. Procedure in
. In summary:
reference/ia-extraction.md-
Fetch, then
<origin>/sitemap.xml, then check<origin>/sitemap_index.xmlforrobots.txtdirectives.Sitemap: -
If no sitemap is reachable, run a same-origin BFS crawl from, depth-limited to 3, link-extracting from rendered HTML.
<url> -
Filter the discovered URL list: same origin only, exclude,
mailto:, anchor-only links, query-only variations, common asset paths (tel:,.css,.js, image extensions)..pdf -
De-duplicate trailing-slash variations.
-
Apply the junk-page filter (§ Junk-page filter) unless
reference/ia-extraction.mdis set. Surface the filtered list to the user as overridable.--no-junk-filter -
Apply the cap (default 5, or, or
--capfor no cap) and proceed silently. Print an informational summary of what was kept and what was cut — but do not gate on user confirmation. The default cap is small enough that the common case is "extract 5 pages and move on"; pausing for a yes/no reply on every run is friction without value. Users who want different scope set it spontaneously at command time:--all$stardust extract https://example.com # default 5 pages $stardust extract https://example.com --cap 25 # bump to 25 $stardust extract https://example.com --all # lift the cap $stardust extract https://example.com --pages home,about,pricing $stardust extract https://example.com --single # just the entry URLThe agent reads spontaneous scope intent from the user's prompt (e.g. "extract all pages", "look at just the home and pricing", "do a full crawl") and applies the equivalent flag. No re-confirmation needed once intent is clear.Informational output (not a prompt — proceed immediately):Discovered 38 pages on https://example.com (sitemap.xml). Filtered as likely junk (5): /test/, /sample-page/, /holiday1/, ... Selecting 5 highest-priority pages: - / (home) - /about - /pricing - /products - /contact Cut (28 pages, --all to lift): /blog/post-1, /blog/post-2, ... Extracting...Selection heuristic: page-type checklist first, then score-based ranking (home + IA-pillar keywords + sitemap priority − archive / version markers). See§ Page selection and § Priority for the cap. The English-only keyword list is a known limitation for localized sites.reference/ia-extraction.md -
Write the discovered list to(created if absent) with
stardust/current/_crawl-log.jsonand the full discovery reasoning, including_provenanceandfilteredAsJunk[]. This is an audit trail, not a state file.userChoice
在爬取前先发现页面清单。流程详见。总结如下:
reference/ia-extraction.md-
先获取,再获取
<origin>/sitemap.xml,然后检查<origin>/sitemap_index.xml中的robots.txt指令。Sitemap: -
若无法获取站点地图,从开始运行同源广度优先搜索(BFS)爬取,深度限制为3,从渲染后的HTML中提取链接。
<url> -
过滤发现的URL列表:仅保留同源地址,排除、
mailto:、仅锚点链接、仅查询参数变体、常见资源路径(tel:、.css、.js、图片扩展名)。.pdf -
去重末尾斜杠变体。
-
应用垃圾页面过滤器(中的「垃圾页面过滤器」章节),除非设置了
reference/ia-extraction.md。将过滤后的列表展示给用户,允许用户覆盖。--no-junk-filter -
应用页数限制(默认5页,或指定的数值,或
--cap解除限制)并静默执行。打印信息摘要说明保留和舍弃的内容——但无需等待用户确认。默认限制足够小,常见场景是「提取5页后继续」;每次运行都暂停等待用户确认会带来不必要的摩擦。用户若需要不同范围,可在命令执行时主动设置:--all$stardust extract https://example.com # 默认提取5页 $stardust extract https://example.com --cap 25 # 提升至25页 $stardust extract https://example.com --all # 解除页数限制 $stardust extract https://example.com --pages home,about,pricing $stardust extract https://example.com --single # 仅提取入口URL智能体可从用户提示中读取主动设定的范围意图(例如「提取所有页面」「仅查看首页和定价页」「进行完整爬取」),并应用对应的参数。意图明确后无需再次确认。信息输出(非提示——立即执行):在https://example.com上发现38个页面(来自sitemap.xml)。 过滤为疑似垃圾页面(5个):/test/, /sample-page/, /holiday1/, ... 选择5个最高优先级页面: - / (首页) - /about - /pricing - /products - /contact 已舍弃(28个页面,使用--all解除限制):/blog/post-1, /blog/post-2, ... 正在提取...选择规则:先按页面类型清单筛选,再按得分排序(首页 + 信息架构支柱关键词 + 站点地图优先级 − 归档/版本标记)。详见中的「页面选择」和「页数限制优先级」章节。仅支持英文关键词是本地化网站的已知限制。reference/ia-extraction.md -
将发现的页面列表写入(不存在则创建),包含
stardust/current/_crawl-log.json和完整的发现逻辑,包括_provenance和filteredAsJunk[]。这是审计追踪文件,而非状态文件。userChoice
Phase 2 — Per-page extraction
阶段2 — 单页提取
For each page in the cap-respecting list, render with Playwright
following . The recipe is mandatory —
in particular, do not skip the wait, scroll, or capture-list steps:
reference/playwright-recipe.md- Viewport 1440 × 900 @ 2× DPR
- Wait per the configured wait mode (default ; see § Wait modes in
medium)reference/playwright-recipe.md - Disable animations via
prefers-reduced-motion: reduce - After the wait resolves, scroll to bottom in 4 viewport-height steps with 300 ms pauses, then return to top — this is required to trigger lazy-load and IntersectionObserver-driven content
- Record and
waitMsin the per-pagewaitMode_provenance
Capture per page (full schema in ):
reference/current-state-schema.md- Page metadata (title, meta description, OG tags, theme-color)
- Semantic structure: heading outline, landmark roles, sections
- Content: visible text per section (full innerText, no
truncation per § Capture list 7), structured paragraphs (
reference/playwright-recipe.md), lists, FAQ Q/A pairs, and review/testimonial quotes per § Capture list 7-bis. Without these structured fields, every body region under a heading falls back to placeholder signature at migrate time.body[] - CTA labels and href targets, link inventory (internal vs external)
- Per-section computed style summary: dominant colors, font families in use, spacing rhythm, border-radius, shadows
- Media inventory: img/srcset with original URLs and intrinsic
dimensions, inline SVG count, video/iframe presence,
(including pseudo-element
cssBackgrounds[]/::beforewalks per § Capture list 11) so::afterheroes and motifs do not silently disappear from extract.background-image - Font files captured via network-intercept (per § Capture list
16): every /
woff2/woff/ttfresponse saved underotfand recorded inassets/fonts/with licensing flag._brand-extraction.json#type.files[] - Icon-font detection (per § Capture list 17): when the page
uses with non-default
[class^="icon-"]font-family + codepoint, capture the family, save the file, and record the::beforetable iniconClass → codepoint._brand-extraction.json#iconFont - Interactive elements: forms (with field types), buttons, modals detected by ARIA roles
Save to with as the
first key. Save referenced media to
preserving basename plus a short content hash.
stardust/current/pages/<slug>.json_provenancestardust/current/assets/media/Live-render evidence (synthesis is forbidden). Refuse to mark
a page in unless its
contains , an ISO-8601 , a
positive integer , a from the recipe, and a
final in the 2xx/3xx range. These five fields are
the contract enforced by
§ Live-render evidence and read back by every downstream phase
via per
§ Provenance
validation. Synthesizing a page record from
plus URL patterns plus captured photos
— the 2026-04-30 lovesac shortcut — is the failure mode this
guard exists to prevent. When the agent (or a delegated sub-
agent) cannot satisfy the contract for a page, treat the page
as a Phase 2 failure: record under
with and continue.
extractedstate.json_provenancerenderedBy: "playwright"fetchedAtwaitMswaitModehttpStatusreference/current-state-schema.mdvalidateProvenance()skills/stardust/reference/state-machine.md_brand-extraction.json_crawl-log.json#crawl.failures[]errorClass: "ProvenanceMissing"Mark the page in immediately after each
successful page write. If a page fails, record the error in
and continue — extraction is best-effort per page.
extractedstate.json_crawl-log.json对于符合页数限制的每个页面,按照使用Playwright渲染页面。该规则为强制要求——尤其不能跳过等待、滚动或捕获列表步骤:
reference/playwright-recipe.md- 视口尺寸1440 × 900 @ 2×设备像素比(DPR)
- 按照配置的等待模式等待(默认;详见
medium中的「等待模式」章节)reference/playwright-recipe.md - 通过禁用动画
prefers-reduced-motion: reduce - 等待完成后,分4步滚动至页面底部(每次滚动一个视口高度,间隔300毫秒),再返回顶部——这是触发懒加载和IntersectionObserver驱动内容的必要步骤
- 在单页的中记录
_provenance和waitMswaitMode
为每个页面捕获以下内容(完整 schema 详见):
reference/current-state-schema.md- 页面元数据(标题、元描述、OG标签、主题色)
- 语义结构:标题大纲、地标角色、区块
- 内容:每个区块的可见文本(完整innerText,不截断,详见中的「捕获列表7」)、结构化段落(
reference/playwright-recipe.md)、列表、FAQ问答对、以及评论/推荐引用(详见「捕获列表7-bis」)。若缺少这些结构化字段,迁移时每个标题下的正文区域都会回退为占位符。body[] - CTA标签和href目标、链接清单(内部链接 vs 外部链接)
- 每个区块的计算样式摘要:主导颜色、使用的字体族、间距规律、圆角半径、阴影
- 媒体清单:带原始URL和固有尺寸的img/srcset、内联SVG数量、video/iframe存在情况、(包括伪元素
cssBackgrounds[]/::before遍历,详见「捕获列表11」),避免::after的英雄图和主题元素在提取过程中静默丢失。background-image - 通过网络拦截捕获字体文件(详见「捕获列表16」):所有/
woff2/woff/ttf响应保存至otf,并在assets/fonts/中记录,包含授权标记。_brand-extraction.json#type.files[] - 图标字体检测(详见「捕获列表17」):当页面使用带非默认字体族+代码点的
::before时,捕获字体族、保存文件,并在[class^="icon-"]中记录_brand-extraction.json#iconFont映射表。iconClass → codepoint - 交互元素:表单(含字段类型)、按钮、通过ARIA角色检测到的模态框
将内容保存至,作为第一个键。将引用的媒体保存至,保留原文件名并添加简短内容哈希。
stardust/current/pages/<slug>.json_provenancestardust/current/assets/media/实时渲染证据(禁止合成)。除非页面的包含、ISO-8601格式的、正整数、规则中指定的,以及2xx/3xx范围内的最终,否则不得在中将页面标记为。这五个字段是中「实时渲染证据」章节强制执行的约定,后续每个阶段都会通过读取验证(详见中的「来源验证」章节)。从、URL模式和捕获的照片合成页面记录——即2026-04-30 lovesac网站的快捷方式——是此防护机制要避免的失败模式。当智能体(或委托的子智能体)无法为页面满足该约定时,将页面视为阶段2失败:在中记录,然后继续执行。
_provenancerenderedBy: "playwright"fetchedAtwaitMswaitModehttpStatusstate.jsonextractedreference/current-state-schema.mdvalidateProvenance()skills/stardust/reference/state-machine.md_brand-extraction.json_crawl-log.json#crawl.failures[]errorClass: "ProvenanceMissing"每个页面写入成功后,立即在中将其标记为。若页面提取失败,在中记录错误并继续——提取为每页的最大努力尝试。
state.jsonextracted_crawl-log.jsonPhase 3 — Brand-surface extraction
阶段3 — 品牌资产提取
Run once, after Phase 2 has finished, so cross-page aggregation
has data to work with. Produces
per . Some fields are home-only (logo,
voice samples, register heuristic); the visual tokens that drive
DESIGN.md (palette, radius, shadow, type) are aggregated across all
extracted pages to avoid the home-page bias documented in
§ Aggregation scope. Captures:
stardust/current/_brand-extraction.jsonreference/brand-surface.mdbrand-surface.md- Logo by the v1 priority chain: inline SVG → with logo-ish class/id →
<img>→apple-touch-icon→ favicon → synthesized placeholder. Save toog:image.stardust/current/assets/logo.<ext> - Palette — aggregate computed colors across all extracted pages (background, text, accents, borders, hovers). Frequency-sort, cluster near-duplicates, emit a role-named list (background, surface, text, primary, secondary, accent).
- Type — font families in use with their weights, sizes, and
computed line-heights. Identify the heading family vs body family.
Run the modular-scale audit (§ Modular-scale audit) and emit
brand-surface.md.scaleAudit.kind = "modular" | "ad-hoc" - Motifs — signature border-radius (cross-page mode of non-zero
values, weighted by element count), shadow stack (top 3 distinct,
cross-page), gradient inventory, common patterns (chip, badge,
card, hero-with-image). When the home-only mode disagrees with the
cross-page mode, surface the divergence in .
_provenance.notes - Voice samples — first paragraph of body copy, the hero headline,
3 representative CTA labels, a representative link list. Used by
later but extracted now so the network round-trip is over.
direct - Hero image — elevate the home page's primary visual
asset to (per
voice.heroImage§ heroImage resolution). Without this elevation, downstream prototype reasons over a 16-image list and frequently picks thereference/brand-surface.mdinstead of the live hero.og:image - Icon font — when detected per § Capture list 17, populate
reference/playwright-recipe.mdwith family, file path, and the_brand-extraction.json#iconFonttable so prototypes can render the brand's actual icons.iconClass → codepoint - System components — cross-page repeated DOM blocks (site
header, site footer, cross-promo strips, persistent CTAs,
breadcrumbs). Detected by heading-sequence + CTA-label fingerprint
per § System components. Required — these are usually the most load-bearing surfaces and must not silently disappear from the redesign target.
reference/brand-surface.md
Do not invent values. Every captured value cites a source selector or
URL in for traceability.
_brand-extraction.json在阶段2完成后运行一次,以便跨页面聚合有数据可用。按照生成。部分字段仅来自首页(logo、语音样本、注册类型推断);驱动DESIGN.md的视觉令牌(调色板、圆角、阴影、字体)会聚合所有提取页面的数据,避免中「聚合范围」章节记录的首页偏差。捕获内容包括:
reference/brand-surface.mdstardust/current/_brand-extraction.jsonbrand-surface.md- Logo:按照v1优先级链提取:内联SVG → 带logo类/id的→
<img>→apple-touch-icon→ favicon → 合成占位符。保存至og:image。stardust/current/assets/logo.<ext> - 调色板 — 聚合所有提取页面的计算颜色(背景、文本、强调色、边框、悬停色)。按频率排序,聚类近似颜色,输出带角色名称的列表(背景、表面、文本、主色、次要色、强调色)。
- 字体 — 使用的字体族及其字重、字号、计算行高。区分标题字体族和正文字体族。运行模块化比例审计(中的「模块化比例审计」章节),输出
brand-surface.md。scaleAudit.kind = "modular" | "ad-hoc" - 主题元素 — 标志性圆角(跨页面非零值的众数,按元素数量加权)、阴影堆栈(跨页面最常见的3种)、渐变清单、常见模式(芯片、徽章、卡片、带图英雄区)。当仅首页模式与跨页面模式结果不一致时,在中记录差异。
_provenance.notes - 语音样本 — 正文第一段、英雄区标题、3个代表性CTA标签、一个代表性链接列表。供后续命令使用,但现在提取可避免重复网络请求。
direct - 英雄图 — 将首页的主要视觉资产提升至(详见
voice.heroImage中的「heroImage解析」章节)。若不进行此提升,下游原型会在16张图片列表中筛选,经常会错误选择reference/brand-surface.md而非实际的英雄图。og:image - 图标字体 — 若按照中的「捕获列表17」检测到,在
reference/playwright-recipe.md中填充字体族、文件路径和_brand-extraction.json#iconFont映射表,以便原型能渲染品牌的实际图标。iconClass → codepoint - 系统组件 — 跨页面重复的DOM块(站点页眉、站点页脚、跨推广条、持久CTA、面包屑)。通过标题序列+CTA标签指纹检测(详见中的「系统组件」章节)。这是强制要求——这些通常是承载核心功能的界面,不能在重新设计目标中静默丢失。
reference/brand-surface.md
不得凭空生成值。每个捕获的值都要在中引用源选择器或URL,确保可追溯。
_brand-extraction.jsonPhase 4 — Seed stardust/current/PRODUCT.md
and DESIGN.md
stardust/current/PRODUCT.mdDESIGN.md阶段4 — 生成stardust/current/PRODUCT.md
和DESIGN.md
stardust/current/PRODUCT.mdDESIGN.mdThe current-state PRODUCT.md and DESIGN.md are descriptive, not
authored — there is no interview to run because the user is not
defining intent here, the agent is describing the existing site. Write
them directly using impeccable's format specs:
- For PRODUCT.md, follow the section structure in impeccable's
. Populate
reference/teach.mdfrom the brand surface (sites that read as marketing/landing →Register; tools/dashboards →brand; ambiguous →productwith a note). Populatebrand,Users,Product Purpose,Brand Personality, andAnti-referencesfrom the captured copy and the brand surface. Where the agent must infer, mark the section withDesign Principlesand a one-line basis sentence._provenance: inferred - For DESIGN.md and DESIGN.json, follow the format spec in
impeccable's . Populate frontmatter (
reference/document.md,colors,typography,rounded,spacing) from the captured tokens. Thecomponentsblock of DESIGN.json carries v1'sextensions,componentStyle, andmotifsarrays so nothing is lost.voice
Stardust does not invoke or for the current-state files: those commands write to project
root (the target) and run an interview. Stardust authors the
descriptive snapshot directly. The format spec from impeccable is the
contract; the runtime command is not.
$impeccable teach$impeccable documentThe target-state PRODUCT.md and DESIGN.md at the project root are
written by in Phase 2 of the pipeline, not here.
$stardust direct当前状态的PRODUCT.md和DESIGN.md是描述性的,而非创作性的——无需与用户访谈,因为用户在此阶段未定义意图,智能体仅描述现有网站。直接按照impeccable的格式规范编写:
- 对于PRODUCT.md,遵循impeccable的中的章节结构。从品牌资产中填充
reference/teach.md字段(营销/着陆类网站 →Register;工具/仪表板类 →brand;模糊不清 →product并添加说明)。从捕获的文案和品牌资产中填充brand、Users、Product Purpose、Brand Personality和Anti-references字段。若需要推断,在章节中标记Design Principles并添加一行推断依据。_provenance: inferred - 对于DESIGN.md和DESIGN.json,遵循impeccable的中的格式规范。从捕获的令牌中填充前置内容(
reference/document.md、colors、typography、rounded、spacing)。DESIGN.json的components块包含v1的extensions、componentStyle和motifs数组,确保无信息丢失。voice
Stardust不会为当前状态文件调用或:这些命令会写入项目根目录(目标状态)并运行访谈。Stardust直接编写描述性快照。impeccable的格式规范是约定,而非运行时命令。
$impeccable teach$impeccable document项目根目录下的目标状态PRODUCT.md和DESIGN.md由管道阶段2中的命令编写,而非在此阶段。
$stardust directPhase 5 — Render stardust/current/brand-review.html
stardust/current/brand-review.html阶段5 — 渲染stardust/current/brand-review.html
stardust/current/brand-review.htmlAfter Phase 4 writes the descriptive PRODUCT.md and DESIGN.md, emit
the current-state brand review per
.
reference/brand-review-template.mdThe brand-review HTML is the first surface a human can eyeball to
verify the extraction before committing to a redesign direction.
Misreads in the JSON (a wrong dominant radius, a missing system
component, a single-page palette bias) are obvious to the eye in five
seconds and invisible in JSON until someone notices. Putting the
review at the end of catches misreads while they are still
cheap to fix — re-extract is fast; re-direct + re-prototype is not.
extractThe template is mandatory. In particular:
- Run the Tensions detectors listed in
§ Detectors. Each rule is mechanical; emit a tension card whenever the trigger condition matches. The review may ship with zero tensions if the data is too thin to evaluate, but the detectors must always be run.
reference/brand-review-template.md - Render in the brand's own captured colors and fonts, not a stardust shell.
- Embed all CSS; do not load external JavaScript or fonts unless the live site already does.
- Cite the source artifact for every section (e.g.
under Typography).
_brand-extraction.json § type
If the data for a section is missing, omit the section — do not
fabricate placeholders. The coverage callout at the top reflects what
is missing.
阶段4写入描述性的PRODUCT.md和DESIGN.md后,按照生成当前状态的品牌审核页面。
reference/brand-review-template.md品牌审核HTML是人类可直观查看的第一个界面,用于在确定重新设计方向前验证提取结果。JSON中的错误(错误的主导圆角、缺失的系统组件、单页调色板偏差)在视觉上只需5秒即可发现,但在JSON中可能无人注意。将审核放在命令末尾,可在修复成本较低时发现错误——重新提取速度快;而重新执行+重新生成原型则耗时较长。
extractdirect模板为强制要求。尤其需要注意:
- 运行中的「张力检测器」章节列出的张力检测器。每个规则都是机械性的;触发条件匹配时输出张力卡片。若数据不足无法评估,审核页面可能没有张力卡片,但必须始终运行检测器。
reference/brand-review-template.md - 使用品牌自身捕获的颜色和字体渲染,而非Stardust的默认外壳。
- 嵌入所有CSS;除非在线网站已加载,否则不加载外部JavaScript或字体。
- 为每个章节引用源工件(例如,Typography章节下标注)。
_brand-extraction.json § type
若某章节数据缺失,省略该章节——不得生成占位符。顶部的覆盖范围说明会反映缺失内容。
Phase 6 — Update state and report
阶段6 — 更新状态并生成报告
After all Phase 2-5 writes succeed:
-
Update(schema in
stardust/state.json):skills/stardust/reference/state-machine.md- ,
site.originUrl,site.extractedAt,site.pageCap,site.totalDiscoveredsite.crawled - — one entry per crawled page with
pages[], filledstatus: "extracted", emptycurrentStatePathandprototypePathmigratedPath
-
Print a one-screen summary:
Extracted https://example.com (5/38 pages, sitemap.xml) stardust/current/ PRODUCT.md (register: brand, inferred from landing) DESIGN.md (5 colors, 2 type families, 3 motifs) brand-review.html (4 tensions surfaced) pages/ (5 files) assets/logo.svg (extracted from inline SVG) _brand-extraction.json _crawl-log.json Per-page evidence: slug live waitMode waitMs status / yes medium 2380 200 /about yes medium 2110 200 /pricing yes medium 1940 200 /products yes medium 2640 200 /contact yes domcontentloaded(fb) 8000 200 Wait summary: 4 resolved at medium (avg 2.4s), 1 fallback (timed out at 8s) → /contact may be under-captured; consider --refresh Open stardust/current/brand-review.html to verify the extraction before running $stardust direct. Coverage note: extracted 5 of 38 discovered pages. The brand surface and brand-review use cross-page aggregation, so 5 pages covering distinct templates is usually sufficient. To extract more, re-run with --cap <N> (e.g. --cap 25) or list specific slugs with --pages. Next: $stardust direct (resolve a redesign direction)The per-page evidence table is mandatory. Thecolumn islivewhenyesAND_provenance.renderedBy === "playwright", elsewaitMs > 0. Anorow means the page record was not produced by a live Playwright render — this should never happen given the write-time guard, but the visible column is the defense-in-depth signal that catches the failure mode when it does (the 2026-04-30 lovesac synthesis bug went four phases deep before being caught because no report column surfaced the missing provenance). A maintainer scanning the summary should seenoon every row.yesCompute the wait summary by grouping each page'sand averaging_provenance.waitMode. List slugs whosewaitMsends inwaitMode(rendered as(fallback)in the table for width) as candidates for(fb).--refresh
阶段2-5的所有写入操作成功后:
-
更新(schema详见
stardust/state.json):skills/stardust/reference/state-machine.md- 、
site.originUrl、site.extractedAt、site.pageCap、site.totalDiscoveredsite.crawled - — 每个爬取页面对应一个条目,
pages[],填充status: "extracted",currentStatePath和prototypePath为空migratedPath
-
打印单屏摘要:
已提取https://example.com(38页中的5页,来自sitemap.xml) stardust/current/ PRODUCT.md (注册类型:brand,从着陆页推断) DESIGN.md (5种颜色、2种字体族、3种主题元素) brand-review.html (发现4处张力) pages/ (5个文件) assets/logo.svg (从内联SVG提取) _brand-extraction.json _crawl-log.json 单页证据: slug live waitMode waitMs status / yes medium 2380 200 /about yes medium 2110 200 /pricing yes medium 1940 200 /products yes medium 2640 200 /contact yes domcontentloaded(fb) 8000 200 等待摘要:4个页面以medium模式完成(平均2.4秒),1个页面使用回退方案(超时8秒) → /contact可能未完全捕获;考虑使用--refresh重新提取 运行$stardust direct前,请打开stardust/current/brand-review.html验证提取结果 覆盖范围说明:已提取38个发现页面中的5个。品牌资产和品牌审核使用跨页面聚合,因此覆盖不同模板的5个页面通常已足够。若要提取更多页面,重新运行时使用--cap <N>(例如--cap 25)或使用--pages指定特定slug。 下一步:$stardust direct (确定重新设计方向)单页证据表为强制要求。列在live且_provenance.renderedBy === "playwright"时为waitMs > 0,否则为yes。no表示页面记录并非由Playwright实时渲染生成——鉴于写入时的防护机制,此情况不应发生,但可见列是深度防御信号,可在故障发生时及时捕获(2026-04-30 lovesac合成漏洞在4个阶段后才被发现,因为报告中没有列显示缺失的来源信息)。维护人员扫描摘要时应看到所有行的no值均为live。yes通过分组每个页面的并计算_provenance.waitMode的平均值来生成等待摘要。列出waitMs以waitMode结尾的slug(表格中显示为(fallback)以节省宽度),作为(fb)的候选页面。--refresh
Outputs
输出内容
| Path | Purpose |
|---|---|
| Descriptive strategy of the existing site (impeccable format) |
| Descriptive visual system (Stitch format) |
| Sidecar with extensions for motifs, voice, components |
| Self-contained visual review of the extraction (first eyeball-able artifact) |
| Per-page parsed structure + content |
| Extracted logo |
| Extracted media referenced by pages |
| Per-page viewport screenshots (used by brand-review) |
| Consolidated brand surface (palette, type, motifs, voice, system components) |
| Discovery + crawl audit trail |
| Updated with site + per-page status |
| 路径 | 用途 |
|---|---|
| 现有网站的描述性策略(impeccable格式) |
| 描述性视觉系统(Stitch格式) |
| 包含主题元素、语音、组件扩展的辅助文件 |
| 提取结果的独立视觉审核界面(首个可直观查看的工件) |
| 单页解析结构 + 内容 |
| 提取的logo |
| 页面引用的提取媒体 |
| 单页视口截图(供品牌审核使用) |
| 整合后的品牌资产(调色板、字体、主题元素、语音、系统组件) |
| 发现 + 爬取审计追踪 |
| 更新后的站点 + 单页状态 |
Concurrency
并发处理
Per : stardust does not lock. Two concurrent extracts
on the same project are last-write-wins. Document this in the user
report; do not engineer around it.
state-machine.md按照:Stardust不锁定资源。同一项目上的两个并发提取操作采用最后写入获胜原则。在用户报告中记录此规则,无需进行额外工程处理。
state-machine.mdFailure modes
失败模式
- Network failure mid-crawl. Continue, record in , end with a partial state. State.json reflects only successfully extracted pages. User can re-run; already-extracted pages are skipped unless
_crawl-log.json.--refresh <slug> - HTTP 4xx/5xx, non-HTML content, soft-404s. Validated explicitly
per § Response validation. Each produces a distinct error class (
reference/playwright-recipe.md,HTTPError,ContentTypeError) recorded inEmptyPageError. Failed pages do not appear in_crawl-log.json#crawl.failures[]asstate.json— they appear only in the failure log. Without this validation a 5xx page silently lands as an empty success and propagates wrong data toextractedanddirect.prototype - Login wall. Do not attempt to authenticate. If the home page redirects to a login screen, capture that one page, mark the rest as unreachable, and ask the user how to proceed (provide cookies via Playwright config, change the entry URL, or scope to public pages).
- Bot-management block (Akamai / Cloudflare / F5 / Imperva).
When the first navigation returns ,
ERR_HTTP2_PROTOCOL_ERROR, or hangs through the hard-cap on a TLS/H2 fingerprint check, the issue is JA3/H2 fingerprinting on bundled-chromium-default headless mode — not auth, not network. Switch toERR_QUIC_PROTOCOL_ERRORperheadless: false, channel: 'chrome'§ Bot-management fallback. Do not retry headless: it will fail identically. The headed fallback works against most enterprise / commerce origins;reference/playwright-recipe.md+ stealth plugin is a non-standard escape hatch for the residual cases. The headed window pops visibly, which is acceptable for interactive runs and unacceptable for unattended pipelines — surface this to the user when first triggered.playwright-extra - JavaScript-only content. Playwright already handles this. If
the configured wait condition never fires within the mode's hard
cap (§ Wait modes), fall back to
reference/playwright-recipe.mdand capture what is rendered. Record the fallback in the per-pagedomcontentloadedand surface in the wait-summary line of the final report._provenance.waitMode - Synthesis attempt (forbidden). When the agent (or a delegated
sub-agent) cannot run a real Playwright render for a page —
whether due to time pressure, token budget, or a tool/network
failure — the only correct outcome is to record the page as a
Phase 2 failure (in
errorClass: "ProvenanceMissing") and continue. Synthesizing a page record from_crawl-log.json#crawl.failures[]plus URL patterns plus captured photos at "semantically matching" template positions is forbidden. This was the 2026-04-30 lovesac.com failure — 20 of 25 pages synthesized this way and the cascade ran four phases on the synthesized data before the gap was caught by a meta-question. The synthesis shortcut produces output indistinguishable from a successful run and propagates fabricated content through every downstream phase._brand-extraction.json
- 爬取中途网络故障。继续执行,在中记录,最终生成部分状态。state.json仅反映成功提取的页面。用户可重新运行;已提取的页面会被跳过,除非使用
_crawl-log.json。--refresh <slug> - HTTP 4xx/5xx、非HTML内容、软404。按照中的「响应验证」章节进行显式验证。每种情况都会生成不同的错误类(
reference/playwright-recipe.md、HTTPError、ContentTypeError),记录在EmptyPageError中。失败页面不会在_crawl-log.json#crawl.failures[]中标记为state.json——仅会出现在失败日志中。若没有此验证,5xx页面会静默作为空成功记录,并将错误数据传播至extracted和direct命令。prototype - 登录墙。不尝试认证。若首页重定向至登录界面,捕获该页面,标记其余页面为不可访问,并询问用户如何处理(通过Playwright配置提供cookie、更改入口URL或限定为公开页面)。
- 机器人管理拦截(Akamai / Cloudflare / F5 / Imperva)。当首次导航返回、
ERR_HTTP2_PROTOCOL_ERROR,或在TLS/H2指纹检查中超时达到硬限制时,问题出在捆绑Chromium默认无头模式的JA3/H2指纹——而非认证或网络问题。按照ERR_QUIC_PROTOCOL_ERROR中的「机器人管理回退方案」切换为reference/playwright-recipe.md。不要重试无头模式:会再次失败。有头回退方案对大多数企业/电商源地址有效;headless: false, channel: 'chrome'+ stealth插件是剩余情况的非标准解决方案。有头窗口会可见弹出,这在交互式运行中可接受,但在无人值守管道中不可接受——首次触发时需告知用户。playwright-extra - 纯JavaScript内容。Playwright已处理此情况。若配置的等待条件在模式的硬限制内始终未触发(详见中的「等待模式」章节),回退至
reference/playwright-recipe.md并捕获已渲染的内容。在单页的domcontentloaded中记录回退操作,并在最终报告的等待摘要行中显示。_provenance.waitMode - 合成尝试(禁止)。当智能体(或委托的子智能体)无法为页面运行真实的Playwright渲染时——无论是由于时间压力、令牌预算还是工具/网络故障——唯一正确的结果是将页面标记为阶段2失败(在中记录
_crawl-log.json#crawl.failures[])并继续执行。**禁止从errorClass: "ProvenanceMissing"、URL模式和捕获的照片合成页面记录,即使是在「语义匹配」的模板位置。**这正是2026-04-30 lovesac.com的失败原因——25个页面中有20个以此方式合成,且错误数据在4个阶段后才被元问题发现。合成捷径生成的输出与成功运行无法区分,并会将伪造内容传播至每个下游阶段。_brand-extraction.json
Prep mode (--prep)
准备模式(--prep)
When invoked with , extract runs an extended pass that
prepares the inventory for migration. Discovery-mode runs (without
) are unchanged: small cap, no typing, no module detection,
presales-friendly. is the gesture that says "the user is
committing to migrate; build the data structure migrate consumes."
--prep--prep--prep--prep使用调用时,extract命令会运行扩展流程,为迁移准备页面清单。未使用的发现模式运行保持不变:小页数限制、无页面分类、无模块检测、适合售前场景。表示「用户已承诺进行迁移;构建迁移所需的数据结构」。
--prep--prep--prep--prep1. Lift the cap
1. 解除页数限制
--prep--allreference/ia-extraction.md--prep--allreference/ia-extraction.mdSub-agent prompt requirements (when delegating)
子智能体提示要求(委托时)
When is heavy enough that the agent delegates extraction
to a sub-agent (a presales-shaped pattern when the inventory is
large), the sub-agent prompt must:
--prep- Forbid synthesis by name. The literal sentence
*"do not synthesize a page record from
_brand-extraction.json- URL patterns + captured photos; every page must be a live Playwright render"* must appear in the prompt. The earlier wording "must actually invoke Playwright per page" was satisfiable in spirit by synthesis-with-photo-reuse and produced the lovesac failure. Naming the shortcut explicitly closes that loophole.
- Require a per-page evidence table in the return. Columns:
. The parent agent reads this table on completion and aborts if any row is missing or shows
slug | waitMode | waitMs | fetchedAt | httpStatus.waitMs: 0 - Require the wait-summary line in the return, formatted identically to Phase 6's wait summary, so the parent can surface it in the user-facing report without reformatting.
These three are mandatory; missing any of them in the sub-agent
prompt is itself a recipe violation. The cascade-level guard in
validates the resulting per-page JSONs via
regardless — but a well-formed sub-agent
return makes the failure cheaper to diagnose.
prepare-migrationvalidateProvenance()当任务过重,智能体将提取委托给子智能体时(页面清单较大时的售前典型模式),子智能体的提示必须包含:
--prep- 明确禁止合成。必须包含字面句子:"不得从+ URL模式 + 捕获的照片合成页面记录;每个页面都必须是Playwright实时渲染的结果"。之前的措辞*"必须为每个页面实际调用Playwright"*可被合成+照片复用的方式满足,导致了lovesac失败。明确命名该捷径可关闭此漏洞。
_brand-extraction.json - 要求返回单页证据表。列:。父智能体在完成时读取此表,若任何行缺失或
slug | waitMode | waitMs | fetchedAt | httpStatus则中止操作。waitMs: 0 - 要求返回等待摘要行,格式与阶段6的等待摘要完全相同,以便父智能体无需重新格式化即可在面向用户的报告中显示。
这三项为强制要求;子智能体提示中缺少任何一项均违反规则。中的级联防护会通过验证最终的单页JSON——但格式良好的子智能体返回可降低故障诊断成本。
prepare-migrationvalidateProvenance()2. Page typing
2. 页面分类
For each extracted page, infer the field from URL pattern
and content shape (LLM judgment). Catalog from
§ Page types:
.
typeskills/stardust/reference/state-machine.mdlanding | article | listing | program | form | static | uniqueWrite the inferred type to . The user
confirms or refines during . Discovery-mode runs
leave as .
state.json.pages[].typedirect --preptypenull为每个提取的页面,从URL模式和内容形态推断字段(LLM判断)。分类来自中的「页面类型」章节:。
typeskills/stardust/reference/state-machine.mdlanding | article | listing | program | form | static | unique将推断的类型写入。用户可在期间确认或细化。发现模式运行时为。
state.json.pages[].typedirect --preptypenull3. Module candidate detection
3. 模块候选检测
After Phase 3 (brand-surface extraction), scan extracted pages for
recurring structural patterns. A pattern that appears in N+
pages with similar shape (same sequence of elements, same
/ , similar text shape) is surfaced
as a module candidate.
data-sectiondata-purpose阶段3(品牌资产提取)完成后,扫描提取的页面以查找重复的结构模式。在N+个页面中出现的、形态相似(元素序列相同、/相同、文本形态相似)的模式会被标记为模块候选。
data-sectiondata-purposeSignal-source priority
信号源优先级
Detection consumes per-page captured fields in this priority
order. Each higher signal is weighted more heavily in the
match-score; lower signals are tie-breakers and corroboration,
not primary evidence. The priority exists because higher-up
fields are explicitly extracted and structured (no parsing
ambiguity), while the bottom of the list (
substring search) is fragile against capture variations and was
the source of the 2026-04-29 sliccy.com under-detection (0
hits for , 1 of 2 hits for —
both modules genuinely present on every page, both invisible
because the substrings being searched lived past the truncation
boundary that has since been removed).
landmarks[].innerTextpre-footer-shellinstall-tile- — cross-page repeats of the same heading text in the same level. Highest signal: structured, explicit, captured in full regardless of body length.
pages/<slug>.json#headings[] - labels — cross-page repeats of the same CTA label appearing on similar surfaces.
pages/<slug>.json#ctas[] - URLs — same asset URL on multiple pages is a strong system-component signal (already specced as a system-component candidate in
pages/<slug>.json#media.cssBackgrounds[]§ Cross-page CSS-background reuse; module detection consumes the same signal at finer granularity).reference/brand-surface.md - actions — cross-page repeats of the same form
pages/<slug>.json#forms[]URL. Newsletter / contact / search forms are the typical hits.action - when present (per future
pages/<slug>.json#components.componentsByLandmarkextension): per-landmark counts of cards / grids / etc.current-state-schema.md - Substring search in — lowest signal. Use only as corroboration once a candidate has already passed the higher-signal checks; never as the primary detector.
landmarks[].innerText
A candidate that fires on signals 1 + 2 above the threshold is
high-confidence; a candidate that fires only on 6 should be
treated as speculative and surfaced as such for the user to
confirm in .
direct --prepCandidate output is a draft entry under
:
DESIGN.json.extensions.modules[]json
{
"id": "candidate-<short-hash>",
"slots": [
{ "name": "<inferred>", "type": "text|link|image|...", "required": false }
],
"instances": [
{ "slug": "home", "selector": "..." },
{ "slug": "donate", "selector": "..." }
],
"status": "candidate"
}The flag distinguishes draft entries from
confirmed modules. is where the user names them
and promotes (or prunes).
status: "candidate"direct --prep检测按以下优先级使用单页捕获字段。优先级越高的信号在匹配得分中的权重越大;低优先级信号仅用于打破平局和佐证,而非主要证据。优先级存在的原因是上方字段为显式提取的结构化字段(无解析歧义),而列表底部(子字符串搜索)易受捕获变化影响,是2026-04-29 sliccy.com检测不足的原因(命中0次,命中2次中的1次——这两个模块确实存在于每个页面,但由于搜索的子字符串超出了现已移除的截断边界而无法被检测到)。
landmarks[].innerTextpre-footer-shellinstall-tile- — 跨页面重复出现的相同层级的相同标题文本。最高信号:结构化、显式、无论正文长度如何均完整捕获。
pages/<slug>.json#headings[] - 标签 — 跨页面在相似界面上重复出现的相同CTA标签。
pages/<slug>.json#ctas[] - URL — 同一资产URL出现在多个页面上是强系统组件信号(已在
pages/<slug>.json#media.cssBackgrounds[]中的「跨页面CSS背景复用」章节中指定为系统组件候选;模块检测在更细粒度上使用相同信号)。reference/brand-surface.md - 动作 — 跨页面重复出现的相同表单
pages/<slug>.json#forms[]URL。典型命中为通讯订阅/联系/搜索表单。action - (若存在,基于未来的
pages/<slug>.json#components.componentsByLandmark扩展):每个地标的卡片/网格等组件计数。current-state-schema.md - 子字符串搜索 — 最低信号。仅在候选已通过高信号检查后用作佐证;绝不能作为主要检测器。
landmarks[].innerText
通过信号1+2且超过阈值的候选为高置信度;仅通过信号6的候选应视为推测性,在中展示给用户确认。
direct --prep候选输出为下的草稿条目:
DESIGN.json.extensions.modules[]json
{
"id": "candidate-<short-hash>",
"slots": [
{ "name": "<inferred>", "type": "text|link|image|...", "required": false }
],
"instances": [
{ "slug": "home", "selector": "..." },
{ "slug": "donate", "selector": "..." }
],
"status": "candidate"
}status: "candidate"direct --prep4. Typed content slots
4. 带类型的内容插槽
Per-page JSON () gains a
section that identifies content slots per page-type:
current/pages/<slug>.jsonslots- pages:
article,headline,deck,byline,meta,lead-image,body,pullquotes[]related[] - pages:
listing,index-headline,filter-controlswith typed sub-slots per cardcard-grid - pages:
program,program-headline,summary,feature-gridcta-band - ,
landing,form— typed slots inferred per content shapestatic
Schema additions live in
§ Typed slots (extend that doc separately).
reference/current-state-schema.md单页JSON()新增章节,为每个页面类型识别内容插槽:
current/pages/<slug>.jsonslots- 页面:
article、headline、deck、byline、meta、lead-image、body、pullquotes[]related[] - 页面:
listing、index-headline、带类型子插槽的filter-controlscard-grid - 页面:
program、program-headline、summary、feature-gridcta-band - 、
landing、form— 根据内容形态推断类型插槽static
Schema扩展位于中的「带类型的插槽」章节(需单独扩展该文档)。
reference/current-state-schema.md5. Prep summary
5. 准备阶段摘要
Replace Phase 6's standard report with the prep summary format:
extract --prep complete
=======================
Inventory: 127 pages crawled (5 prior, 122 new)
Provenance: 127/127 live (every page has Playwright evidence)
Page types: landing 1 · article 84 · listing 6 · program 12 · form 3 · static 18 · unique 3
(LLM-inferred; refine in direct --prep)
Module candidates: 8
hotline-211 5 instances (home, get-help, donate, news, programs)
donate-band 12 instances (home, donate, news, all article footers)
story-card 7 instances (home, news, programs)
...
Typed slots: filled per page-type (see current/pages/<slug>.json § slots)
Next: $stardust direct --prep (confirm types, name modules)The line is mandatory in
prep-mode output. When the ratio is anything other than
the prep run has failed the synthesis guard;
list the affected slugs as a sub-bullet and treat the prep run
as incomplete (the cascade-level guard in
SKILL.md surfaces the same check between
phases).
Provenance: <live>/<total> live<total>/<total>prepare-migrationDefault mode (no ) is unchanged. The flag is intended for
the orchestrator, though direct invocation is
supported.
--prepprepare-migration用准备阶段摘要格式替换阶段6的标准报告:
extract --prep complete
=======================
页面清单: 已爬取127个页面(5个已存在,122个新增)
来源验证: 127/127为实时渲染(每个页面都有Playwright证据)
页面类型: landing 1 · article 84 · listing 6 · program 12 · form 3 · static 18 · unique 3
(LLM推断;在direct --prep中细化)
模块候选: 8个
hotline-211 5个实例 (首页、获取帮助、捐赠、新闻、项目)
donate-band 12个实例 (首页、捐赠、新闻、所有文章页脚)
story-card 7个实例 (首页、新闻、项目)
...
带类型插槽: 按页面类型填充(详见current/pages/<slug>.json § slots)
下一步: $stardust direct --prep (确认类型,为模块命名)来源验证: <live>/<total> live<total>/<total>prepare-migration默认模式(无)保持不变。该标志供编排器使用,但也支持直接调用。
--prepprepare-migrationReferences
参考文档
- — viewport, capture list, logo locator chain.
reference/playwright-recipe.md - — sitemap + BFS crawl + cap procedure.
reference/ia-extraction.md - — per-page JSON schema.
reference/current-state-schema.md - — consolidated brand-surface schema.
reference/brand-surface.md - — current-state brand-review HTML contract + Tensions detectors.
reference/brand-review-template.md - — state.json contract.
skills/stardust/reference/state-machine.md - — provenance shape.
skills/stardust/reference/artifact-map.md
- — 视口、捕获列表、logo定位链。
reference/playwright-recipe.md - — 站点地图 + BFS爬取 + 页数限制流程。
reference/ia-extraction.md - — 单页JSON schema。
reference/current-state-schema.md - — 整合后的品牌资产schema。
reference/brand-surface.md - — 当前状态品牌审核HTML约定 + 张力检测器。
reference/brand-review-template.md - — state.json约定。
skills/stardust/reference/state-machine.md - — 来源信息形态。
skills/stardust/reference/artifact-map.md