Web Video Presentation

Turn an article or script into a screen-recordable "video disguised as a web page" step by step, with optional voiceover audio synthesis. Deliverables = Vite + React + TS project + chapter-split audio.

Application Scenarios

"I have a script/article, help me turn it into a video" — Voiceover-driven content
Want to create "dynamic PPT"
16:9 landscape screen recording, with large text, white space, and animations on every screen
Cinematic feel for teaching / product demos / keynotes
Content for Bilibili / YouTube / Douyin videos

This Skill focuses on methodology + collaboration process as its core. The scaffold template provides tokens and primitives, but every aesthetic decision (color scheme, font style, animation vibe) should be redesigned for your theme — do not copy directly.

Workflow Overview

Phase 1   Content Creation
   1.1  Identify User Input
   1.2  One-time output of script.md + outline.md
        (Voiceover script + development plan)
   ▼
[Checkpoint Plan]      ← Must pause. Align on 5 items at once:
                         Script / Outline / Theme / Assets / Development Mode
   ▼
Phase 2   Web Development
   2.1  Scaffold (based on selected theme)
   2.2  Chapter 1 = Main thread + complete version (mandatory anchor)
        ▼
        [Hard Node] User acceptance of Chapter 1 ← Cannot skip
        ▼
   2.3  Chapters 2~N (per selected mode: A Chapter-by-Chapter / B Sequential / C Parallel)
   ▼
[Checkpoint Audio]     ← Must pause. Whether to synthesize audio
   ▼
Phase 3   Audio Synthesis (Optional)
   ▼
Phase 4   Screen Recording + Post-production

Working Directory Convention (agent creates/edits in user's current directory):

my-video/
├── article.md          # Required if user provides original text — Do NOT delete! Source of screen information during development
├── script.md           # Required: Bilibili-style voiceover script (determines beats)
├── outline.md          # Required: Development plan (chapter split + content per step + information pool)
└── presentation/       # Vite + React + TS project generated by scaffold
    ├── src/chapters/<NN>-<id>/
    │   ├── <Chapter>.tsx     # Visual implementation
    │   ├── <Chapter>.css
    │   └── narrations.ts     # ★ Single source of truth for step count + voiceover text
    ├── scripts/
    │   ├── extract-narrations.ts   # Scan all narrations.ts → audio-segments.json
    │   └── synthesize-audio.sh     # Call mmx to synthesize mp3
    ├── audio-segments.json         # Output of extract (review before synthesis)
    └── public/audio/<id>/<N>.mp3   # Optional: Synthesized audio

Key:
narrations.ts
is the single source of truth for step count and audio synthesis. The maximum N in
if (step === N)
in chapter
.tsx
plus 1 must equal
narrations.length
. This ensures that 5 places (script / outline / chapter code / chapters.ts / audio files) never get out of sync.

Mandatory Self-check Protocol (Throughout the Skill)

For each of the following three deliverables, must go through self-check → fix → report/proceed again after completion:

Deliverable	Self-check Checklist Source
`script.md`	Three-layer self-check (form / essence / read-aloud) in `SCRIPT-STYLE.md`
`outline.md`	Self-check in `OUTLINE-FORMAT.md`
Single chapter implementation complete	Completion self-check in `CHAPTER-CRAFT.md`

Execution Methods (in descending order of priority, prefer more isolated methods):

Agent Teams (Optimal): Launch an independent reviewer agent, provide it with "deliverable file path + corresponding checklist + key context", and let it verify item by item and report conclusions strictly (which items pass / which fail + evidence + rewrite suggestions).
subAgent (Second-best): If no Teams capability but can launch subagent, use subagent to follow the same process.
Self-check (Fallback): If the current agent has none of the above capabilities, conduct strict item-by-item self-check — visual inspection only is not allowed.

Iron Rule: After getting the conclusion, fix the deliverable according to failed items first, then report to the user "completed

self-check conclusion + changes made". Reporting original conclusion directly without fixing = violation.

File Reading Guidelines for Each Phase

Read different files in different phases. Agents tend to forget principles in long sessions, especially Phase 2.4 "Implement Single Chapter" which repeats N times — always review core constraints each time.

Phase	Must Read (Every Time)	Read Once / Check On Demand
Phase 1.1-1.2 Content Creation	`references/SCRIPT-STYLE.md` + `references/OUTLINE-FORMAT.md` + `article.md` (user's original text, if provided)	——
Checkpoint Plan Theme Selection	——	`themes/*/theme.json` (dynamically read all, list + `bestFor` recommendation + `descriptionZh` ); `references/THEMES.md` (when user wants to learn about theme system)
Phase 2.1 Scaffold	——	Read this section of SKILL.md once
Phase 2.4 Implement Single Chapter (×N times, called by 2.2 / 2.3)	`references/CHAPTER-CRAFT.md` Single entry point — Part 0 Ten Principles / Part 1 5 Pre-development Questions / Part 2 Relationship→Action Decision Tree / Part 3 Visual Toolkit / Part 4 Duration Reference / Part 5 Anti-AI Cliché Anti-patterns / Part 6 Code Hard Rules (including narrations.ts mandatory constraints) / Part 7 Completion Self-check / Part 8 Quick Feedback Reference + current theme's `themes/<id>/theme.json` + current chapter's outline.md paragraph + `article.md` corresponding paragraph for this chapter + asset list	`references/EXAMPLES/` (structure illustration, not a copy template); `references/THEMES.md` complete token contract
Phase 3 Audio Synthesis	`references/AUDIO.md` (including narrations.ts → segments.json → mmx workflow)	——
Phase 4 Screen Recording + Post-production	`references/RECORDING.md` (including `?auto=1` auto recording)	——
Select / Create / Switch Theme	——	`references/THEMES.md`

Only read
CHAPTER-CRAFT.md
when writing chapters. Ten principles / pre-development self-prompting / decision tree / anti-AI clichés / completion self-check are all integrated into this single entry point.
EXAMPLES/
is not mandatory — design freely based on content first, refer to it only when stuck (refer to "structure", do not copy directly).

Phase 1 — Content Creation (One-time Output)

1.1 Identify User Input

User's Input	Action to Take
Original article (written language / official account / paper / blog)	One-time output of `script.md` + `outline.md` (1.2), proceed to Checkpoint Plan
Direct voiceover script / video script	Save as `script.md` , one-time output of simplified `outline.md` (1.2), proceed to Checkpoint Plan
Nothing provided, only says "help me make a video on X theme"	Ask back: Provide a piece of material or outline first. This Skill does not create content for users

1.2 One-time Output of script.md + outline.md

Complete both deliverables in one thinking process:

Generate
script.md
: Convert the article to Bilibili-style voiceover script following the rules in
```
SCRIPT-STYLE.md
```
. Keep
article.md
intact — it is the source of details for writing information pools and chapter screen implementations (dual-source principle).
Generate
outline.md
: Split chapters + split steps + extract information pool for the first paragraph of each chapter following the rules in
```
OUTLINE-FORMAT.md
```
.

Outline Boundaries (Key):

Must Write in Outline	Do NOT Write in Outline
Chapter split / step count per chapter / estimated time	Specific animation types (blur clear / wipe / spring)
Screen content per step (hero / data / slogan / list item)	CSS implementation methods (filter / SVG / clip-path)
Chapter-level information pool: Numbers / quotes / cases / tags extracted from article	Duration values (do not write ~~2.5s / 80~~120ms)
Step-level relationship prefix (optional hints like "contrast comparison" / "progressive list" / "golden sentence")	Micro-rhythm like continuous micro-movement / staggered timing

Reason for not writing animations in outline: Writing animations in advance reduces the chapter agent to a translator; Leaving blank space allows the chapter agent to design freely according to the "content-driven decision tree" in
CHAPTER-CRAFT.md
when starting each step, which creates a true video feel. See Principle 7 in Part 0 of
CHAPTER-CRAFT.md
for details.

After saving, must go through self-check before entering Checkpoint Plan: Execute self-check for

script.md

outline.md

respectively according to the "Mandatory Self-check Protocol" above (prefer Agent Teams → subAgent → self-check), and enter Checkpoint Plan only after fixing according to the conclusion.

Checkpoint Plan — Align on 5 Items at Once (Hard Node)

Must pause after writing

script.md

outline.md

. User confirms 5 items at this single node.

Preparation Work for Agent at This Stage

Read all

themes/*/theme.json

to get

nameZh

descriptionZh

bestFor

mood

— Do NOT hardcode the list

Based on the content type / keywords / tone of
```
script.md
```
, actively select 2~3 most matching recommendations from themes (match
```
bestFor
```
field)
Scan the "Asset List" section at the end of
```
outline.md
```

Summary Template (Framework, agent fills according to situation)

Content plan completed, deliverables:
  📄 article.md     {retained if user provided original text}
  📄 script.md      {X} words / ~{T} minutes
  📄 outline.md     {N} chapters / {M} steps + per-chapter information pool + asset list at the end

Chapter Overview:
  1. <id>     <Chapter Title>    <S> steps ~<T>s
  2. ...

Next, align on 5 items at once:

  1. Do you need to modify the script (script.md)?
     You can edit the file directly, or tell me the modification direction verbally.

  2. Do you need to modify the development plan (outline.md)? Focus on:
     - Is chapter split / step count / estimated time reasonable (reasonable judgment: 30~60s per chapter)
     - Is screen content per step clear
     - Does the "information pool" in the first paragraph of each chapter have enough article details for screen design
     - Is the asset list at the end complete

  3. Which theme to choose? My recommendations:
     ★ <Recommendation 1: nameZh (id)> — Because <bestFor match>; <descriptionZh summary>
     ★ <Recommendation 2 / Recommendation 3>
     Other options: <Remaining themes, nameZh + one sentence>
     I can also help you create a new theme (see themes/THEMES.md for details).

  4. How to prepare real assets? Rough list of images needed for this video: <list rough items>
     a) I will pick from <existing asset path>   b) You provide them yourself   c) Use placeholder for all

  5. Which development mode to choose?

     **Chapter 1 must be completed in main thread + user acceptance regardless of mode** (mandatory anchor).
     Differences start from Chapter 2:

     A) Default · Chapter-by-Chapter Confirmation (Recommended)
        Pause for acceptance after each chapter is completed → Low risk / most stable rhythm
     B) Sequential Development After Chapter 1 (No Parallel)
        Complete Chapters 2~N sequentially in main thread then accept uniformly → Medium speed / suitable for agents that do not support parallel tasks
     C) Parallel Development After Chapter 1 (subagent)
        Use subagent to complete Chapters 2~N in parallel → Fastest / user controls parallel count (how many chapters at once)
        ⚠️ Style differences between chapters are expected (theme constraints ensure consistency)

After receiving feedback:

If script / outline needs modification: Edit the file directly, ping after editing (or describe modifications verbally for agent to adjust)
Theme must be confirmed before entering Phase 2. If user says "you choose the theme" → Select your first recommendation, tell the user what you chose and why, and give them a chance to change their mind
After mode is confirmed → Enter Phase 2

Phase 2 — Web Development

2.1 Scaffold

bash

bash .cursor/skills/web-video-presentation/scripts/scaffold.sh \
  ./presentation \
  --theme=<user-selected theme id>

bash .cursor/skills/web-video-presentation/scripts/scaffold.sh --list-themes

For custom themes → First create a
themes/<my-theme>/
following the "Create New Theme" workflow in
references/THEMES.md
, then use
--theme=<my-theme>
.

The scaffold includes an

01-example

demo. Delete it before writing the first real chapter:

bash

rm -rf presentation/src/chapters/01-example

And remove the import and array item of

EXAMPLE_CHAPTER

presentation/src/registry/chapters.ts

2.2 Chapter 1 — Main Thread + Mandatory Acceptance

Core: Chapter 1 = Complete version delivered at once (rhythm + visuals + real assets all ready). No "skeleton version" concept — The first chapter must be a sample that users can directly accept.

Why Chapter 1 must be in main thread:

It is the first implementation of the guidelines in
```
CHAPTER-CRAFT.md
```
under the current theme + current subject matter
If there are blind spots in guidelines / insufficient theme colors / font tokens, Chapter 1 will definitely expose them — With human feedback, guidelines can be revised / theme adjusted, cost of early modification is lowest
Subsequent chapters (regardless of sequential / parallel) will reference the code pattern of Chapter 1, so Chapter 1 = "Style anchor" for the current project (does not require full consistency between chapters, but each chapter must have complete persuasiveness)

Must pause after completing Chapter 1 to wait for user acceptance:

Chapter 1 <id> is completed, dev server is running at localhost:5173.

Acceptance Focus:
  □ Is the visual vibe correct? Does it meet the expectation of <theme nameZh>?
  □ Is the rhythm correct? Are some steps too fast / too slow / too thin on information?
  □ Is content-driven animation in place? Or are some steps just mindless entrance animations?
  □ Dual-source principle: Does the screen have details that "are not mentioned in voiceover but can be linked to article"?
  □ Anti-AI check: Are there purple-pink gradients / rounded colored borders / fake illustrations / emojis?

Tell me any issues, and I will modify accordingly. Let me know "continue" when it's OK, and I will proceed with Chapters 2 and beyond according to the selected mode.

2.3 Chapters 2~N — Per Selected Mode

Common Rules for All Modes: Each chapter is developed independently following

CHAPTER-CRAFT.md

. Full style consistency between chapters is not required — Theme color / font tokens ensure visual unity, and free play of animation / rhythm / visual demonstration in chapters is a design expectation.

Mode A · Default · Chapter-by-Chapter Confirmation

Complete Chapter 2 → Pause for acceptance → OK → Chapter 3 → Pause → ... → Chapter N. Accept each chapter independently, modify issues anytime, lowest risk, most stable rhythm. Default to this mode if user does not specify.

Mode B · Sequential Development After Chapter 1

Complete Chapter 2 → Chapter 3 → ... → Chapter N sequentially in main thread, then accept uniformly. Medium speed, suitable for environments where agents do not support parallel tasks.

Mode C · Parallel Development After Chapter 1 (subagent)

Use subagent to complete Chapters 2~N in parallel, maximum parallel count controlled by user ("4 chapters at once" / "2 chapters at once"). Fastest, but style differences between chapters are expected — This is intentional because:

Each subagent cannot see outputs of other subagents, so mechanical alignment is impossible
Chapter code is physically separated (one folder per chapter / own CSS prefix), no mutual interference
Theme tokens ensure visual unity (colors / fonts / hero numbers / cards / divider style / decorations), vibe won't deviate
Style inconsistency = breathing feel of manually created videos (multiple voices / perspectives)

The prompt for parallel subagents must include:

Current chapter's outline paragraph (including information pool)
Path of
```
references/CHAPTER-CRAFT.md
```
(Single mandatory read — All requirements for visual demonstration + gradual reveal + dual-source principle + anti-AI clichés + code red lines + completion self-check are in this single file)
```
descriptionZh
```
/
```
mood
```
/
```
bestFor
```
of current theme's
```
theme.json
```
(only for reference of vibe, animation / duration / font size / emojis are decided freely by chapter agent)
Chapter 1 code as "code style" reference (not "visual copy object")
Hard rules: Independent CSS prefix per chapter (
```
.cd-
```
/
```
.mg-
```
/
```
.pm-
```
/ ...); Do not modify
```
chapters.ts
```
; Run
```
npx tsc --noEmit
```
after completion

Important: Regardless of the selected mode, users can switch modes at any time. After Chapter 2 is OK, user can say "do the rest in parallel" / "do the rest chapter-by-chapter" and it will be accommodated.

2.4 Implement Single Chapter (Mandatory for Each Chapter)

Detailed guidelines are in

CHAPTER-CRAFT.md

— Single mandatory entry point, covering: Visual demonstration requirements / gradual reveal / content selection / dual-source principle / basic video demonstration aesthetics / anti-AI clichés / code red lines / completion self-check.

Core Points (Detailed in CHAPTER-CRAFT.md):

Each chapter must have CSS / SVG / Canvas / JS visual demonstration, pure text chapters are prohibited
Gradual reveal: Lists must have 1 item = 1 step, full display at once is prohibited
Dual-source principle: Rhythm follows voiceover script (order cannot be changed), details are extracted from original article (information pool + corresponding article paragraph for this chapter)
Go through completion self-check item by item, modify if not up to standard — Execute according to the "Mandatory Self-check Protocol" above (prefer Agent Teams → subAgent → self-check), report chapter delivery to user only after modification

2.5 Bump STORAGE_KEY After Major Changes

After modifying

chapters.ts

(adding / deleting / reordering chapters, or changing length of

narrations.ts

in any chapter), bump the

STORAGE_KEY

presentation/src/hooks/useStepper.ts

(e.g.,

v4

→

v5

), to avoid persistent cursor landing on non-existent steps.

Checkpoint Audio — Whether to Synthesize Audio (Hard Node)

Must pause after Phase 2 ends, ask user:

Web page completed, {N} chapters {M} steps, dev server is running at localhost:5173.

Do you want to synthesize audio for "auto-play screen recording"?
  ✓ Synthesize → Scan narrations.ts of all chapters to generate audio-segments.json,
           call mmx-cli to synthesize each step into an mp3 file in public/audio/.
           After synthesis, use ?auto=1 mode for one-take screen recording (audio and video are naturally synchronized).
           If mmx is not installed locally, you will be asked which TTS to use.
  ✗ Do not synthesize → Skip Phase 3, proceed directly to Phase 4 for manual screen recording + post-production dubbing.

If synthesize → Phase 3. If not → directly to Phase 4.

Phase 3 — Audio Synthesis (Optional)

Detailed workflow is in

references/AUDIO.md

. Simplified version:

bash

cd presentation
npm run extract-narrations   # Scan all narrations.ts → audio-segments.json
# Let user scan audio-segments.json to confirm text is correct
npm run synthesize-audio     # Call mmx for serial synthesis; incremental, skip existing files

After synthesis, tell user: Output location / total number of segments / which segments have abnormal duration (too long = split the step; too short = thin copy) — Give one last chance to calibrate rhythm. Then enter Phase 4.

Phase 4 — Screen Recording + Post-production

Details are in

references/RECORDING.md

. Two paths:

Scenario	Recommended Path
Audio synthesized in Phase 3	Auto mode one-take: Open `localhost:5173/?auto=1` in browser → Press SPACE → The whole video plays automatically → Stop recording → Trim head and tail to get the final video, no need to sync audio track in post-production
Phase 3 skipped	Default Manual mode: Click manually to advance → Dub with any post-production editing tool

Agent should actively tell users the suitable screen recording path after Phase 3 / Checkpoint Audio.

Ten Principles (One-sentence List)

Full expansion is in Part 0 of

references/CHAPTER-CRAFT.md

— Refer to it when writing chapters, the following is just an index.

#	Principle	One Sentence
1	16:9 Fixed Stage	Content 1920×1080 + transform scale, no responsiveness
2	Global Step Counter	Chapters are pure functions of steps, no timers
3	Full Screen per Step	`if (step === N) return <FullScene />`
4	Voiceover Beat = Step	One beat = one step = one focused idea
5	Hidden Corner Controls	Progress bar / page turner default opacity 0
6	No Chrome on Stage	No header / footer / page number / brand bar
7	Content-driven Animation	Find internal action first, use entrance animation only as fallback; use continuous micro-movement cautiously
8	Gradual Reveal of Multiple Points	1 item = 1 step, synchronous stagger of N items is prohibited
9	Same Theme for Whole Video	No color flip between chapters; colors / fonts follow tokens, other dimensions are free per chapter
10	Dual-source Principle	Script determines beats, article determines screen density (reflected in information pool)

Quick Reference for Common User Feedback

Simplified table is in Part 8 "Quick Reference for Common Feedback" of

references/CHAPTER-CRAFT.md

. Key: First locate which layer it belongs to (rhythm / visual / content / code), then modify the smallest slice, do not redo the whole chapter.

Related Resources

Marked by "when to read" to avoid reading all at once:

File	When to Read	Content
`references/SCRIPT-STYLE.md`	Mandatory in Phase 1.2	Rules for converting article to voiceover script, platform variants
`references/OUTLINE-FORMAT.md`	Mandatory in Phase 1.2	outline.md field spec, naming conventions, chapter split, information pool
`references/CHAPTER-CRAFT.md`	Single mandatory entry point for each chapter in Phase 2.4	Part 0 Ten Principles / Part 1 5 Pre-development Questions / Part 2 Relationship→Action Decision Tree / Part 3 Visual Toolkit / Part 4 Duration / Part 5 Anti-AI Cliché Anti-patterns / Part 6 Code Hard Rules / Part 7 Completion Self-check / Part 8 Quick Feedback Reference
`references/EXAMPLES/`	Optional — View structure	Chapter structure illustration (hook / list-reveal / case-tech-review); Not a copy template
`references/THEMES.md`	When selecting / creating / switching themes	Complete token contract + built-in theme list + creation workflow
`references/AUDIO.md`	Read only in Phase 3	MiniMax CLI, TTS fallback path, troubleshooting
`references/RECORDING.md`	Read only in Phase 4	Screen recording tools + post-production synthesis
`themes/`	Refer to during Checkpoint Plan / Phase 1.2	Built-in themes (each includes `theme.json` + `tokens.css` )
`scripts/scaffold.sh`	Run once in Phase 2.1	One-click project scaffold

web-video-presentation

NPX Install

Tags

SKILL.md Content (Chinese)

Web Video Presentation

Application Scenarios

Workflow Overview

Mandatory Self-check Protocol (Throughout the Skill)

File Reading Guidelines for Each Phase

Phase 1 — Content Creation (One-time Output)

1.1 Identify User Input

1.2 One-time Output of script.md + outline.md

Checkpoint Plan — Align on 5 Items at Once (Hard Node)

Preparation Work for Agent at This Stage

Summary Template (Framework, agent fills according to situation)

Phase 2 — Web Development

2.1 Scaffold

2.2 Chapter 1 — Main Thread + Mandatory Acceptance

2.3 Chapters 2~N — Per Selected Mode

Mode A · Default · Chapter-by-Chapter Confirmation

Mode B · Sequential Development After Chapter 1

Mode C · Parallel Development After Chapter 1 (subagent)

2.4 Implement Single Chapter (Mandatory for Each Chapter)

2.5 Bump STORAGE_KEY After Major Changes

Checkpoint Audio — Whether to Synthesize Audio (Hard Node)

Phase 3 — Audio Synthesis (Optional)

Phase 4 — Screen Recording + Post-production

Ten Principles (One-sentence List)

Quick Reference for Common User Feedback

Related Resources