Web Video Presentation
Turn an article or script into a screen-recordable "video disguised as a web page" step by step, with optional voiceover audio synthesis. Deliverables = Vite + React + TS project + chapter-split audio.
Application Scenarios
- "I have a script/article, help me turn it into a video" — Voiceover-driven content
- Want to create "dynamic PPT"
- 16:9 landscape screen recording, with large text, white space, and animations on every screen
- Cinematic feel for teaching / product demos / keynotes
- Content for Bilibili / YouTube / Douyin videos
This Skill focuses on methodology + collaboration process as its core. The scaffold template provides tokens and primitives, but every aesthetic decision (color scheme, font style, animation vibe) should be redesigned for your theme — do not copy directly.
Workflow Overview
Phase 1 Content Creation
1.1 Identify User Input
1.2 One-time output of script.md + outline.md
(Voiceover script + development plan)
▼
[Checkpoint Plan] ← Must pause. Align on 5 items at once:
Script / Outline / Theme / Assets / Development Mode
▼
Phase 2 Web Development
2.1 Scaffold (based on selected theme)
2.2 Chapter 1 = Main thread + complete version (mandatory anchor)
▼
[Hard Node] User acceptance of Chapter 1 ← Cannot skip
▼
2.3 Chapters 2~N (per selected mode: A Chapter-by-Chapter / B Sequential / C Parallel)
▼
[Checkpoint Audio] ← Must pause. Whether to synthesize audio
▼
Phase 3 Audio Synthesis (Optional)
▼
Phase 4 Screen Recording + Post-production
Working Directory Convention (agent creates/edits in user's current directory):
my-video/
├── article.md # Required if user provides original text — Do NOT delete! Source of screen information during development
├── script.md # Required: Bilibili-style voiceover script (determines beats)
├── outline.md # Required: Development plan (chapter split + content per step + information pool)
└── presentation/ # Vite + React + TS project generated by scaffold
├── src/chapters/<NN>-<id>/
│ ├── <Chapter>.tsx # Visual implementation
│ ├── <Chapter>.css
│ └── narrations.ts # ★ Single source of truth for step count + voiceover text
├── scripts/
│ ├── extract-narrations.ts # Scan all narrations.ts → audio-segments.json
│ └── synthesize-audio.sh # Call mmx to synthesize mp3
├── audio-segments.json # Output of extract (review before synthesis)
└── public/audio/<id>/<N>.mp3 # Optional: Synthesized audio
Key:
is the
single source of truth for step count and audio synthesis.
The maximum N in
in chapter
plus 1 must equal
. This ensures that 5 places (script / outline / chapter code /
chapters.ts / audio files) never get out of sync.
Mandatory Self-check Protocol (Throughout the Skill)
For each of the following three deliverables, must go through self-check → fix → report/proceed again after completion:
| Deliverable | Self-check Checklist Source |
|---|
| Three-layer self-check (form / essence / read-aloud) in |
| Self-check in |
| Single chapter implementation complete | Completion self-check in |
Execution Methods (in descending order of priority, prefer more isolated methods):
- Agent Teams (Optimal): Launch an independent reviewer agent, provide it with "deliverable file
path + corresponding checklist + key context", and let it verify item by item and report conclusions strictly
(which items pass / which fail + evidence + rewrite suggestions).
- subAgent (Second-best): If no Teams capability but can launch subagent, use subagent
to follow the same process.
- Self-check (Fallback): If the current agent has none of the above capabilities, conduct strict item-by-item
self-check — visual inspection only is not allowed.
Iron Rule: After getting the conclusion, fix the deliverable according to failed items first, then report to the user "completed
- self-check conclusion + changes made". Reporting original conclusion directly without fixing = violation.
File Reading Guidelines for Each Phase
Read different files in different phases. Agents tend to forget principles in long sessions, especially
Phase 2.4 "Implement Single Chapter" which repeats N times — always review core constraints each time.
| Phase | Must Read (Every Time) | Read Once / Check On Demand |
|---|
| Phase 1.1-1.2 Content Creation | references/SCRIPT-STYLE.md
+ references/OUTLINE-FORMAT.md
+ (user's original text, if provided) | —— |
| Checkpoint Plan Theme Selection | —— | (dynamically read all, list + recommendation + ); (when user wants to learn about theme system) |
| Phase 2.1 Scaffold | —— | Read this section of SKILL.md once |
| Phase 2.4 Implement Single Chapter (×N times, called by 2.2 / 2.3) | references/CHAPTER-CRAFT.md
Single entry point — Part 0 Ten Principles / Part 1 5 Pre-development Questions / Part 2 Relationship→Action Decision Tree / Part 3 Visual Toolkit / Part 4 Duration Reference / Part 5 Anti-AI Cliché Anti-patterns / Part 6 Code Hard Rules (including narrations.ts mandatory constraints) / Part 7 Completion Self-check / Part 8 Quick Feedback Reference + current theme's + current chapter's outline.md paragraph + corresponding paragraph for this chapter + asset list | (structure illustration, not a copy template); complete token contract |
| Phase 3 Audio Synthesis | (including narrations.ts → segments.json → mmx workflow) | —— |
| Phase 4 Screen Recording + Post-production | (including auto recording) | —— |
| Select / Create / Switch Theme | —— | |
Only read when writing chapters. Ten principles / pre-development self-prompting /
decision tree / anti-AI clichés / completion self-check are all integrated into this single entry point.
is not mandatory — design freely based on content first, refer to it only when stuck (refer to "structure", do not copy directly).
Phase 1 — Content Creation (One-time Output)
1.1 Identify User Input
| User's Input | Action to Take |
|---|
| Original article (written language / official account / paper / blog) | One-time output of + (1.2), proceed to Checkpoint Plan |
| Direct voiceover script / video script | Save as , one-time output of simplified (1.2), proceed to Checkpoint Plan |
| Nothing provided, only says "help me make a video on X theme" | Ask back: Provide a piece of material or outline first. This Skill does not create content for users |
1.2 One-time Output of script.md + outline.md
Complete both deliverables in one thinking process:
- Generate : Convert the article to Bilibili-style voiceover script following the rules in . Keep intact — it is the source of details for writing information pools and chapter screen implementations (dual-source principle).
- Generate : Split chapters + split steps + extract information pool for the first paragraph of each chapter following the rules in .
Outline Boundaries (Key):
| Must Write in Outline | Do NOT Write in Outline |
|---|
| Chapter split / step count per chapter / estimated time | Specific animation types (blur clear / wipe / spring) |
| Screen content per step (hero / data / slogan / list item) | CSS implementation methods (filter / SVG / clip-path) |
| Chapter-level information pool: Numbers / quotes / cases / tags extracted from article | Duration values (do not write 2.5s / 80120ms) |
| Step-level relationship prefix (optional hints like "contrast comparison" / "progressive list" / "golden sentence") | Micro-rhythm like continuous micro-movement / staggered timing |
Reason for not writing animations in outline: Writing animations in advance reduces the chapter agent to a translator;
Leaving blank space allows the chapter agent to design freely according to the "content-driven decision tree" in
when starting each step, which creates a true video feel. See Principle 7 in Part 0 of
for details.
After saving, must go through self-check before entering Checkpoint Plan: Execute self-check for
/
respectively according to the "Mandatory Self-check Protocol" above (prefer Agent Teams → subAgent → self-check),
and enter Checkpoint Plan only after fixing according to the conclusion.
Checkpoint Plan — Align on 5 Items at Once (Hard Node)
Must pause after writing
+
.
User confirms 5 items at this single node.
Preparation Work for Agent at This Stage
- Read all to get / /
/ — Do NOT hardcode the list
- Based on the content type / keywords / tone of , actively select 2~3
most matching recommendations from themes (match field)
- Scan the "Asset List" section at the end of
Summary Template (Framework, agent fills according to situation)
Content plan completed, deliverables:
📄 article.md {retained if user provided original text}
📄 script.md {X} words / ~{T} minutes
📄 outline.md {N} chapters / {M} steps + per-chapter information pool + asset list at the end
Chapter Overview:
1. <id> <Chapter Title> <S> steps ~<T>s
2. ...
Next, align on 5 items at once:
1. Do you need to modify the script (script.md)?
You can edit the file directly, or tell me the modification direction verbally.
2. Do you need to modify the development plan (outline.md)? Focus on:
- Is chapter split / step count / estimated time reasonable (reasonable judgment: 30~60s per chapter)
- Is screen content per step clear
- Does the "information pool" in the first paragraph of each chapter have enough article details for screen design
- Is the asset list at the end complete
3. Which theme to choose? My recommendations:
★ <Recommendation 1: nameZh (id)> — Because <bestFor match>; <descriptionZh summary>
★ <Recommendation 2 / Recommendation 3>
Other options: <Remaining themes, nameZh + one sentence>
I can also help you create a new theme (see themes/THEMES.md for details).
4. How to prepare real assets? Rough list of images needed for this video: <list rough items>
a) I will pick from <existing asset path> b) You provide them yourself c) Use placeholder for all
5. Which development mode to choose?
**Chapter 1 must be completed in main thread + user acceptance regardless of mode** (mandatory anchor).
Differences start from Chapter 2:
A) Default · Chapter-by-Chapter Confirmation (Recommended)
Pause for acceptance after each chapter is completed → Low risk / most stable rhythm
B) Sequential Development After Chapter 1 (No Parallel)
Complete Chapters 2~N sequentially in main thread then accept uniformly → Medium speed / suitable for agents that do not support parallel tasks
C) Parallel Development After Chapter 1 (subagent)
Use subagent to complete Chapters 2~N in parallel → Fastest / user controls parallel count (how many chapters at once)
⚠️ Style differences between chapters are expected (theme constraints ensure consistency)
After receiving feedback:
- If script / outline needs modification: Edit the file directly, ping after editing (or describe modifications verbally for agent to adjust)
- Theme must be confirmed before entering Phase 2. If user says "you choose the theme" → Select your first recommendation,
tell the user what you chose and why, and give them a chance to change their mind
- After mode is confirmed → Enter Phase 2
Phase 2 — Web Development
2.1 Scaffold
bash
bash .cursor/skills/web-video-presentation/scripts/scaffold.sh \
./presentation \
--theme=<user-selected theme id>
bash .cursor/skills/web-video-presentation/scripts/scaffold.sh --list-themes
For custom themes → First create a
following the "Create New Theme" workflow in
,
then use
.
The scaffold includes an
demo.
Delete it before writing the first real chapter:
bash
rm -rf presentation/src/chapters/01-example
And remove the import and array item of
in
presentation/src/registry/chapters.ts
.
2.2 Chapter 1 — Main Thread + Mandatory Acceptance
Core: Chapter 1 = Complete version delivered at once (rhythm + visuals + real assets all ready).
No "skeleton version" concept — The first chapter must be a sample that users can directly accept.
Why Chapter 1 must be in main thread:
- It is the first implementation of the guidelines in under the current
theme + current subject matter
- If there are blind spots in guidelines / insufficient theme colors / font tokens, Chapter 1 will definitely expose them —
With human feedback, guidelines can be revised / theme adjusted, cost of early modification is lowest
- Subsequent chapters (regardless of sequential / parallel) will reference the code pattern of Chapter 1, so Chapter 1 =
"Style anchor" for the current project (does not require full consistency between chapters, but each chapter must have complete persuasiveness)
Must pause after completing Chapter 1 to wait for user acceptance:
Chapter 1 <id> is completed, dev server is running at localhost:5173.
Acceptance Focus:
□ Is the visual vibe correct? Does it meet the expectation of <theme nameZh>?
□ Is the rhythm correct? Are some steps too fast / too slow / too thin on information?
□ Is content-driven animation in place? Or are some steps just mindless entrance animations?
□ Dual-source principle: Does the screen have details that "are not mentioned in voiceover but can be linked to article"?
□ Anti-AI check: Are there purple-pink gradients / rounded colored borders / fake illustrations / emojis?
Tell me any issues, and I will modify accordingly. Let me know "continue" when it's OK, and I will proceed with Chapters 2 and beyond according to the selected mode.
2.3 Chapters 2~N — Per Selected Mode
Common Rules for All Modes: Each chapter is developed independently following
.
Full style consistency between chapters is not required — Theme color / font tokens ensure visual
unity, and free play of animation / rhythm / visual demonstration in chapters is a design expectation.
Mode A · Default · Chapter-by-Chapter Confirmation
Complete Chapter 2 → Pause for acceptance → OK → Chapter 3 → Pause → ... → Chapter N. Accept each chapter
independently, modify issues anytime, lowest risk, most stable rhythm. Default to this mode if user does not specify.
Mode B · Sequential Development After Chapter 1
Complete Chapter 2 → Chapter 3 → ... → Chapter N sequentially in main thread, then accept uniformly.
Medium speed, suitable for environments where agents do not support parallel tasks.
Mode C · Parallel Development After Chapter 1 (subagent)
Use subagent to complete Chapters 2~N in parallel, maximum parallel count controlled by user ("4 chapters at once"
/ "2 chapters at once"). Fastest, but style differences between chapters are expected — This is intentional because:
- Each subagent cannot see outputs of other subagents, so mechanical alignment is impossible
- Chapter code is physically separated (one folder per chapter / own CSS prefix), no mutual interference
- Theme tokens ensure visual unity (colors / fonts / hero numbers / cards / divider style / decorations), vibe won't deviate
- Style inconsistency = breathing feel of manually created videos (multiple voices / perspectives)
The prompt for parallel subagents must include:
- Current chapter's outline paragraph (including information pool)
- Path of
references/CHAPTER-CRAFT.md
(Single mandatory read — All requirements for visual demonstration +
gradual reveal + dual-source principle + anti-AI clichés + code red lines + completion self-check are in this single file)
- / / of current theme's (only for reference of vibe,
animation / duration / font size / emojis are decided freely by chapter agent)
- Chapter 1 code as "code style" reference (not "visual copy object")
- Hard rules: Independent CSS prefix per chapter ( / / / ...);
Do not modify ; Run after completion
Important: Regardless of the selected mode, users can switch modes at any time. After Chapter 2 is OK,
user can say "do the rest in parallel" / "do the rest chapter-by-chapter" and it will be accommodated.
2.4 Implement Single Chapter (Mandatory for Each Chapter)
Detailed guidelines are in
—
Single mandatory entry point, covering: Visual demonstration requirements / gradual reveal / content selection / dual-source principle
/ basic video demonstration aesthetics / anti-AI clichés / code red lines / completion self-check.
Core Points (Detailed in CHAPTER-CRAFT.md):
- Each chapter must have CSS / SVG / Canvas / JS visual demonstration, pure text chapters are prohibited
- Gradual reveal: Lists must have 1 item = 1 step, full display at once is prohibited
- Dual-source principle: Rhythm follows voiceover script (order cannot be changed), details are extracted from original article (information pool +
corresponding article paragraph for this chapter)
- Go through completion self-check item by item, modify if not up to standard — Execute according to the "Mandatory Self-check Protocol" above
(prefer Agent Teams → subAgent → self-check), report chapter delivery to user only after modification
2.5 Bump STORAGE_KEY After Major Changes
After modifying
(adding / deleting / reordering chapters, or changing length of
in any chapter),
bump the
in
presentation/src/hooks/useStepper.ts
(e.g.,
→
), to avoid persistent cursor landing on non-existent steps.
Checkpoint Audio — Whether to Synthesize Audio (Hard Node)
Must pause after Phase 2 ends, ask user:
Web page completed, {N} chapters {M} steps, dev server is running at localhost:5173.
Do you want to synthesize audio for "auto-play screen recording"?
✓ Synthesize → Scan narrations.ts of all chapters to generate audio-segments.json,
call mmx-cli to synthesize each step into an mp3 file in public/audio/.
After synthesis, use ?auto=1 mode for one-take screen recording (audio and video are naturally synchronized).
If mmx is not installed locally, you will be asked which TTS to use.
✗ Do not synthesize → Skip Phase 3, proceed directly to Phase 4 for manual screen recording + post-production dubbing.
If synthesize → Phase 3. If not → directly to Phase 4.
Phase 3 — Audio Synthesis (Optional)
Detailed workflow is in
. Simplified version:
bash
cd presentation
npm run extract-narrations # Scan all narrations.ts → audio-segments.json
# Let user scan audio-segments.json to confirm text is correct
npm run synthesize-audio # Call mmx for serial synthesis; incremental, skip existing files
After synthesis, tell user: Output location / total number of segments / which segments have abnormal duration (too long = split the step;
too short = thin copy) — Give one last chance to calibrate rhythm. Then enter Phase 4.
Phase 4 — Screen Recording + Post-production
Details are in
. Two paths:
| Scenario | Recommended Path |
|---|
| Audio synthesized in Phase 3 | Auto mode one-take: Open in browser → Press SPACE → The whole video plays automatically → Stop recording → Trim head and tail to get the final video, no need to sync audio track in post-production |
| Phase 3 skipped | Default Manual mode: Click manually to advance → Dub with any post-production editing tool |
Agent should actively tell users the suitable screen recording path after Phase 3 / Checkpoint Audio.
Ten Principles (One-sentence List)
Full expansion is in Part 0 of
references/CHAPTER-CRAFT.md
—
Refer to it when writing chapters, the following is just an index.
| # | Principle | One Sentence |
|---|
| 1 | 16:9 Fixed Stage | Content 1920×1080 + transform scale, no responsiveness |
| 2 | Global Step Counter | Chapters are pure functions of steps, no timers |
| 3 | Full Screen per Step | if (step === N) return <FullScene />
|
| 4 | Voiceover Beat = Step | One beat = one step = one focused idea |
| 5 | Hidden Corner Controls | Progress bar / page turner default opacity 0 |
| 6 | No Chrome on Stage | No header / footer / page number / brand bar |
| 7 | Content-driven Animation | Find internal action first, use entrance animation only as fallback; use continuous micro-movement cautiously |
| 8 | Gradual Reveal of Multiple Points | 1 item = 1 step, synchronous stagger of N items is prohibited |
| 9 | Same Theme for Whole Video | No color flip between chapters; colors / fonts follow tokens, other dimensions are free per chapter |
| 10 | Dual-source Principle | Script determines beats, article determines screen density (reflected in information pool) |
Quick Reference for Common User Feedback
Simplified table is in Part 8 "Quick Reference for Common Feedback" of
references/CHAPTER-CRAFT.md
.
Key: First locate which layer it belongs to (rhythm / visual / content
/ code), then modify the smallest slice,
do not redo the whole chapter.
Related Resources
Marked by "when to read" to avoid reading all at once:
| File | When to Read | Content |
|---|
references/SCRIPT-STYLE.md
| Mandatory in Phase 1.2 | Rules for converting article to voiceover script, platform variants |
references/OUTLINE-FORMAT.md
| Mandatory in Phase 1.2 | outline.md field spec, naming conventions, chapter split, information pool |
references/CHAPTER-CRAFT.md
| Single mandatory entry point for each chapter in Phase 2.4 | Part 0 Ten Principles / Part 1 5 Pre-development Questions / Part 2 Relationship→Action Decision Tree / Part 3 Visual Toolkit / Part 4 Duration / Part 5 Anti-AI Cliché Anti-patterns / Part 6 Code Hard Rules / Part 7 Completion Self-check / Part 8 Quick Feedback Reference |
| Optional — View structure | Chapter structure illustration (hook / list-reveal / case-tech-review); Not a copy template |
| When selecting / creating / switching themes | Complete token contract + built-in theme list + creation workflow |
| Read only in Phase 3 | MiniMax CLI, TTS fallback path, troubleshooting |
| Read only in Phase 4 | Screen recording tools + post-production synthesis |
| Refer to during Checkpoint Plan / Phase 1.2 | Built-in themes (each includes + ) |
| Run once in Phase 2.1 | One-click project scaffold |