music-to-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesemusic-to-video — one music-grounded, beat-synced video workflow
music-to-video — 基于音乐的节拍同步视频工作流
Use this skill to turn a music track into a beat-synced HyperFrames video. You analyze the track once, lay out the frames, fill in a per-frame plan, and build each frame as a composition. The input is a music track plus optional user images or videos — there is no narration and no website capture. Typography and templates are the floor (a complete video needs zero assets); any media the user supplies is cut in on the same beat grid.
You are the orchestrator. Work in . Run the steps in order and pass each Gate before moving on. Two steps need the user: Step 3 (plan approval) and Step 6 (render approval). Do every step yourself except Step 4, where you dispatch one sub-agent per frame. Keep design and motion rules out of this file — they live in and the sub-agent.
videos/<project>/references/frame-workerSKILL_DIRPROJECT_DIRvideos/<project-name>/Workflow: Step 0 setup → + ; Step 1 analyze → ; Step 2 skeleton → (frames, groups ); Step 3 plan → complete + ; Step 4 build → ; Step 5 assemble → ; Step 6 render → .
hyperframes.jsonassets/bgm.mp3audiomap.jsonSTORYBOARD.mdTBDSTORYBOARD.mdframe.mdcompositions/frames/NN-*.htmlindex.htmlrenders/video.mp4使用本技能将音乐曲目转换为节拍同步的HyperFrames视频。你只需一次性解析曲目,规划帧布局,填充逐帧计划,再将每一帧构建为合成内容。输入为音乐曲目加上可选的用户提供图片或视频——无需旁白,也无需捕获网站内容。排版和模板是基础(无需任何素材即可生成完整视频);用户提供的任何媒体素材都会按照同一节拍网格切入。
你作为编排器,需在目录下工作。按顺序执行步骤,通过每个关卡(Gate)后再进入下一步。有两个步骤需要用户参与:步骤3(计划审批)和步骤6(渲染审批)。除了步骤4需要为每一帧分配一个子代理外,其余所有步骤均由你自行完成。设计和动效规则不在本文档中说明——它们存放在目录和子代理中。
videos/<project>/references/frame-workerSKILL_DIRPROJECT_DIRvideos/<project-name>/工作流:步骤0初始化 → + ;步骤1解析 → ;步骤2帧骨架 → (帧,分组标记为);步骤3计划 → 完整的 + ;步骤4构建 → ;步骤5组装 → ;步骤6渲染 → 。
hyperframes.jsonassets/bgm.mp3audiomap.jsonSTORYBOARD.mdTBDSTORYBOARD.mdframe.mdcompositions/frames/NN-*.htmlindex.htmlrenders/video.mp4Two ideas that shape everything
贯穿始终的两个核心理念
- One analyzer, and you trust it. is the only beat analyzer — never re-measure beats with another tool or by ear. Its energy / density / rolls / onsets / silences are always reliable. Its
analyze-beatgrid.pyandbpmare reliable only when the music is genuinely rhythmic; on calm music the grid is a metronome the tracker imposed, so pace by phrases and energy instead and never hard-cut to it. Deciding which case you're in is each frame'sbeats_sec(Step 2).pacing - One frame = one file; groups live inside. Step 2 cuts the track into frames, and each frame becomes one composition file , built by one frame-worker. A frame can subdivide into groups (each a template or a motion-primitives combo). Extra density goes inside a group, so frame count tracks distinct treatments, not beats — a fast track does not blow up the number of sub-agents.
compositions/frames/NN-<frame_id>.html
- 单一分析器,完全信任其结果。是唯一的节拍分析工具——绝不要使用其他工具或人工重新测量节拍。它输出的能量/密度/滚奏/起始点/静音段数据始终可靠。只有当音乐具备明确节奏感时,其输出的
analyze-beatgrid.py和bpm才可靠;对于舒缓音乐,节拍网格是分析器强加的节拍器,因此需根据乐句和能量来控制节奏,绝不要生硬地按节拍切割。在步骤2中,你需要判断当前属于哪种情况,并为每一帧设置beats_sec参数。pacing - 一帧对应一个文件;分组在帧内实现。步骤2将曲目切割为多个帧,每一帧对应一个合成文件,由一个frame-worker子代理构建。一帧可细分为多个分组(每个分组对应一个模板或一组动效原语)。额外的密度变化需在分组内实现,因此帧数量对应不同的处理方式,而非节拍数——节奏快的曲目不会导致子代理数量激增。
compositions/frames/NN-<frame_id>.html
Step 0: Setup, BGM, and inputs
步骤0:初始化、背景音乐(BGM)与输入
Goal: Establish the music source, create the HyperFrames project, and note any user-supplied media.
The music is the spine — establish one track before anything else. This skill is tuned for fast, high-energy BGM: a strong beat grid drives the cuts (calm tracks work, but pace by phrase rather than beat). If the user gave you audio — a music file, or a video to pull the audio from — use it. If not, generate one: choose the mood from the user's description (e.g. "driving synthwave", "trap beat", "upbeat corporate") and produce a track via ( — HeyGen retrieval when credentialed, else local Lyria / MusicGen; ElevenLabs or another generator also works). Either way the track lands at . Stage any user-supplied images or videos so frames can weave them in on the beat grid; otherwise typography carries the whole video.
/hyperframes-mediareferences/bgm.mdassets/bgm.mp3Initialize only if is missing. Name from the brief in kebab-case, such as — never a timestamp.
hyperframes.json<project>midnight-drive-loopbash
npx hyperframes init "videos/<project>" --non-interactive --skip-skills --example=blank
mkdir -p "$PROJECT_DIR/assets" "$PROJECT_DIR/renders"
cp "<user-music>" "$PROJECT_DIR/assets/bgm.mp3" # extract from a video first if needed目标:确定音乐源,创建HyperFrames项目,并记录用户提供的所有媒体素材。
音乐是核心——在进行任何操作前先确定一个曲目。本技能针对**快节奏、高能量的背景音乐(BGM)**优化:清晰的节拍网格驱动画面切割(舒缓音乐也适用,但需根据乐句而非节拍控制节奏)。如果用户提供了音频——音乐文件或可提取音频的视频——直接使用即可。如果没有,则生成一个:根据用户描述选择风格(如“动感合成波”“陷阱节拍”“欢快企业风”),通过生成曲目(参考——有授权时使用HeyGen检索,否则使用本地Lyria/MusicGen;也可使用ElevenLabs或其他生成工具)。无论哪种方式,最终曲目都需保存至。将用户提供的图片或视频整理好,以便帧能按节拍网格融入其中;若没有用户素材,则完全依靠排版生成视频。
/hyperframes-mediareferences/bgm.mdassets/bgm.mp3仅当不存在时才进行初始化。根据需求将命名为短横线分隔格式,例如——绝不要使用时间戳命名。
hyperframes.json<project>midnight-drive-loopbash
npx hyperframes init "videos/<project>" --non-interactive --skip-skills --example=blank
mkdir -p "$PROJECT_DIR/assets" "$PROJECT_DIR/renders"
cp "<user-music>" "$PROJECT_DIR/assets/bgm.mp3" # 若为视频需先提取音频only if the user gave you images/videos:
仅当用户提供了图片/视频时执行:
node <SKILL_DIR>/scripts/stage-assets.mjs --from <dir> --hyperframes "$PROJECT_DIR" --into public
The **brand** (font + palette) is chosen at Step 3, not here. Don't pick a genre or a track type up front — assets are just an optional ingredient, and the genre emerges from the per-frame choices.
**Gate:** `hyperframes.json` + `assets/bgm.mp3` exist; aspect / length / fps and (if any) the asset inventory are noted.
---node <SKILL_DIR>/scripts/stage-assets.mjs --from <dir> --hyperframes "$PROJECT_DIR" --into public
**品牌风格**(字体+配色)在步骤3中选择,而非此处。不要预先选择曲目类型或视频风格——素材只是可选元素,视频风格会由逐帧选择自然生成。
**关卡:** `hyperframes.json` + `assets/bgm.mp3`已存在;已记录画幅/时长/帧率以及(若有)素材清单。
---Step 1: Analyze the music
步骤1:解析音乐
Goal: Produce the one canonical timing analysis the whole video is built on.
analyze-beatgrid.pyaudiomap.jsononset_ratehard_stopskey_momentsaudio.duration_secbpmbeats_secPrerequisites: Python 3 with , , and available. If import fails, install them into the active Python environment before running the analyzer:
librosanumpysoundfilebash
python3 -m pip install librosa numpy soundfilebash
python3 <SKILL_DIR>/scripts/analyze-beatgrid.py "$PROJECT_DIR/assets/bgm.mp3" \
-o "$PROJECT_DIR/audiomap.json" --printGate: exists; is known.
audiomap.jsonaudio.duration_sec目标:生成整个视频构建所依赖的唯一标准时序分析结果。
analyze-beatgrid.pyaudiomap.jsononset_ratehard_stopskey_momentsaudio.duration_secbpmbeats_sec前置条件:Python 3环境已安装、和。若导入失败,先在当前Python环境中安装这些依赖再运行分析器:
librosanumpysoundfilebash
python3 -m pip install librosa numpy soundfilebash
python3 <SKILL_DIR>/scripts/analyze-beatgrid.py "$PROJECT_DIR/assets/bgm.mp3" \
-o "$PROJECT_DIR/audiomap.json" --print关卡: 已存在;已知的值。
audiomap.jsonaudio.duration_secStep 2: Frame skeleton
步骤2:帧骨架
Goal: Read the music and lay out the frames — the skeleton of .
STORYBOARD.mdRead . Turn into the skeleton of yourself — there is no intermediate JSON. Cut the track into frames at real musical changes (, SURGE / DROP , the edges of a roll, a stretch with no onsets, a big energy jump), snapping every boundary to an audiomap anchor. For each frame set , (the verdict from Step 1's trust call — when the grid is real, when it's a metronome imposed on calm music), , and a one-line (the plain music situation Step 3 matches a template against). Only classify and lay out here: leave every frame's as and the frontmatter blank — no templates, copy, color, or fonts. Expect ~1–6 frames.
references/frame-skeleton.mdaudiomap.jsonSTORYBOARD.mdhard_stopskey_momentsspan_secpacingbeat_cutphrase_flowmoodfeel### GroupsTBD (Step 3)styleGate: frames tile the track (first at 0, last at ); each carries + + + ; every is ; no content anywhere.
duration_sspan_secpacingmoodfeel### GroupsTBD目标:解析音乐并规划帧布局——生成的骨架。
STORYBOARD.md阅读。自行将转换为的骨架——无需中间JSON文件。根据音乐的实际变化点(、SURGE/DROP等、滚奏的起止、无起始点的段落、能量大幅跳跃点)将曲目切割为多个帧,将每个边界对齐到audiomap的锚点。为每一帧设置、(基于步骤1的判断结果——当节拍网格有效时设为,当舒缓音乐使用强加节拍器时设为)、,以及一行描述(步骤3中匹配模板的音乐场景说明)。仅在此处进行分类和布局:将每一帧的留为,将前置内容留空——不要添加模板、文案、颜色或字体。预期帧数量约为1–6个。
references/frame-skeleton.mdaudiomap.jsonSTORYBOARD.mdhard_stopskey_momentsspan_secpacingbeat_cutphrase_flowmoodfeel### GroupsTBD (Step 3)style关卡: 帧完整覆盖曲目(第一帧从0秒开始,最后一帧到结束);每一帧包含 + + + ;所有均为;无任何内容填充。
duration_sspan_secpacingmoodfeel### GroupsTBDStep 3: Plan (user-gated)
步骤3:计划(需用户审批)
Goal: Turn the skeleton into an approved, complete .
STORYBOARD.mdRead , , , , and (only if the user supplied assets). Editing the same file in place, do two things:
references/planning.mdstoryboard-format.mdtemplate-catalog.mdmotion-primitive-catalog.mdmontage.md- Pick the brand. Choose one preset from using the table in
../hyperframes-creative/frame-presets/(match the track's mood; only its fonts and colors matter — templates own composition). Copy it into../hyperframes-creative/references/design-spec.mdunmodified and fill the frontmatterframe.md(font + a ≤4–6 swatch palette) from it.style - Fill every frame. Decide its groups and give each a treatment: a matched template from the catalog (with bound params and real audiomap anchors), a free-compose from the primitive catalog, or an asset treatment that obeys . Write the copy. You own WHAT (template / primitives + content + anchors); the frame-worker owns HOW — never write millisecond tweens into the storyboard.
pacing
bash
node <SKILL_DIR>/scripts/validate-plan.mjs --storyboard "$PROJECT_DIR/STORYBOARD.md" \
--audiomap "$PROJECT_DIR/audiomap.json" --templates <SKILL_DIR>/references/templatesFix every (hard errors: duration mismatch, frames not tiling the track, a missing ); warnings are best-effort. Then show the user a frame-by-frame summary and iterate until they approve.
✗srcGate: is a verbatim preset copy; exits 0; the user approved the plan.
frame.mdvalidate-plan.mjs目标:将骨架转换为已获批的完整。
STORYBOARD.md阅读、、、,以及(仅当用户提供素材时)。在原文件中编辑,完成两项工作:
references/planning.mdstoryboard-format.mdtemplate-catalog.mdmotion-primitive-catalog.mdmontage.md- 选择品牌风格。根据中的表格,从
../hyperframes-creative/references/design-spec.md中选择一个预设(匹配曲目风格;仅关注字体和颜色——模板负责构图)。将预设原封不动复制到../hyperframes-creative/frame-presets/中,并从中提取前置内容frame.md(字体+最多4–6色的配色板)。style - 填充每一帧。确定每一帧的分组并为每个分组选择处理方式:从模板库中选择匹配的模板(绑定参数并使用真实的audiomap锚点)、从原语库中选择自由合成方案,或遵循的素材处理方式。编写文案。你负责确定内容(模板/原语+文案+锚点);frame-worker负责实现细节——绝不要在分镜脚本中写入毫秒级补间动画。
pacing
bash
node <SKILL_DIR>/scripts/validate-plan.mjs --storyboard "$PROJECT_DIR/STORYBOARD.md" \
--audiomap "$PROJECT_DIR/audiomap.json" --templates <SKILL_DIR>/references/templates修复所有标记的错误(严重错误:时长不匹配、帧未覆盖曲目、缺少);警告信息尽量修复。然后向用户展示逐帧摘要,迭代直至用户批准。
✗src关卡: 为预设的原封副本;执行结果为0;用户已批准计划。
frame.mdvalidate-plan.mjsStep 4: Build frames
步骤4:构建帧
Goal: Build every frame as a self-contained composition file.
Create . Read and . Dispatch one frame-worker per frame, in parallel where possible (otherwise in waves). Each worker gets exactly one frame and this context:
compositions/frames/sub-agents/frame-worker.md../hyperframes-core/references/subagent-dispatch.mdtext
PROJECT_DIR: <abs path>
frame_id: <NN-frame_id> # = the frame file stem, e.g. 02-f2; the composition id
Your block: the `## Frame N — <frame_id>` block in PROJECT_DIR/STORYBOARD.md
audiomap: PROJECT_DIR/audiomap.json
frame.md: PROJECT_DIR/frame.md
Materials: for each group, <SKILL_DIR>/references/templates/<id>/index.html (templates) and
<SKILL_DIR>/references/motion-primitives/<id>/ (free); staged assets/ (asset groups)
Contracts: ../hyperframes-core/references/sub-compositions.md + determinism-rules.md
Canvas: <w>×<h> Pacing: <beat_cut|phrase_flow>
Write to: PROJECT_DIR/compositions/frames/<frame_id>.htmlThe worker forks the cited materials, converts every anchor to frame-local seconds (), gates its groups with 0ms cuts, and writes one seek-safe frame file. The worker never runs the CLI — those commands operate on the assembled project, which doesn't exist yet, so they'd report on the wrong files. The worker just writes to the contract and stops; you verify after assembly (Step 6). As each worker returns, you can confirm its file landed on disk.
local_t = track_t − span_sec[0]hyperframesGate: every frame has its on disk.
compositions/frames/NN-*.html目标:将每一帧构建为独立的合成文件。
创建目录。阅读和。为每一帧分配一个frame-worker子代理,尽可能并行执行(否则分批执行)。每个子代理仅处理一帧,并获取以下上下文:
compositions/frames/sub-agents/frame-worker.md../hyperframes-core/references/subagent-dispatch.mdtext
PROJECT_DIR: <绝对路径>
frame_id: <NN-frame_id> # = 帧文件的主文件名,例如02-f2;合成ID
你的任务块:PROJECT_DIR/STORYBOARD.md中的`## Frame N — <frame_id>`块
audiomap: PROJECT_DIR/audiomap.json
frame.md: PROJECT_DIR/frame.md
素材:每个分组对应的<SKILL_DIR>/references/templates/<id>/index.html(模板)和
<SKILL_DIR>/references/motion-primitives/<id>/(自由合成);已整理的assets/(素材分组)
约定:../hyperframes-core/references/sub-compositions.md + determinism-rules.md
画布:<w>×<h> 节奏模式:<beat_cut|phrase_flow>
输出路径:PROJECT_DIR/compositions/frames/<frame_id>.html子代理会复制指定素材,将所有锚点转换为帧本地秒数(),为分组设置0ms切割点,并生成一个可安全跳转的帧文件。子代理绝不要运行 CLI——这些命令针对已组装的项目,而此时项目尚未组装,会导致错误的文件报告。子代理只需按照约定生成文件即可;你会在组装后(步骤6)进行验证。每个子代理完成后,你可确认其文件已保存到磁盘。
local_t = track_t − span_sec[0]hyperframes关卡: 每一帧对应的文件已保存到磁盘。
compositions/frames/NN-*.htmlStep 5: Assemble
步骤5:组装
Goal: Wire the built frames + BGM into the playable .
index.htmlassemble-index.mjsdata-startassets/bgm.mp3bash
node <SKILL_DIR>/scripts/assemble-index.mjs --storyboard "$PROJECT_DIR/STORYBOARD.md" \
--hyperframes "$PROJECT_DIR" --audiomap "$PROJECT_DIR/audiomap.json"Fix any it reports — a missing or blank frame file means that worker wrote a partial file; re-dispatch it (Step 4) and re-assemble.
✗Gate: exists; total duration == .
index.htmlaudiomap.audio.duration_sec目标:将已构建的帧+背景音乐整合为可播放的。
index.htmlassemble-index.mjsdata-startassets/bgm.mp3bash
node <SKILL_DIR>/scripts/assemble-index.mjs --storyboard "$PROJECT_DIR/STORYBOARD.md" \
--hyperframes "$PROJECT_DIR" --audiomap "$PROJECT_DIR/audiomap.json"修复所有标记的错误——缺失或空白的帧文件意味着子代理生成了不完整的文件;重新分配该子代理(步骤4)并重新组装。
✗关卡: 已存在;总时长等于。
index.htmlaudiomap.audio.duration_secStep 6: Verify and render
步骤6:验证与渲染
Goal: Verify the assembled video, get user approval, and render the final MP4.
Run the CLI on the assembled project — that's the correct unit (the per-frame workers couldn't run it). checks structure, runs headless Chrome (catching JS errors and missing assets), snapshots frames.
lintvalidateinspectbash
( cd "$PROJECT_DIR" && npx hyperframes lint . && npx hyperframes validate . && npx hyperframes inspect . )Inspect at , each frame start, the strongest DROP / SURGE, every , and the final frame. On failure, make the cheapest safe fix: edit the offending for a local issue; re-dispatch that one frame-worker only when a whole frame must be rebuilt; go back to Step 3 only if the plan is creatively wrong. Never change duration or audio timing to hide a sync issue. Once the gates pass, pause for user review, then render only on approval:
t=0hard_stops[].tcompositions/frames/NN-*.htmlbash
( cd "$PROJECT_DIR" && npx hyperframes render . -q draft -o renders/video.mp4 --fps 30 )Gate: / / passed; the user approved; exists with audio, duration == . The final reply states the MP4 path and duration.
lintvalidateinspectrenders/video.mp4audiomap.audio.duration_sec目标:验证已组装的视频,获取用户批准,并渲染最终MP4文件。
在已组装的项目上运行CLI——这是正确的操作单元(逐帧子代理无法运行CLI)。检查结构,运行无头Chrome(捕获JS错误和缺失素材),生成帧快照。
lintvalidateinspectbash
( cd "$PROJECT_DIR" && npx hyperframes lint . && npx hyperframes validate . && npx hyperframes inspect . )检查、每一帧的起始点、最强的DROP/SURGE点、每个点,以及最后一帧。若失败,采取最简便的安全修复方式:针对局部问题编辑对应的;仅当整个帧必须重建时重新分配该帧的子代理;仅当计划存在创意错误时才回到步骤3。绝不要修改时长或音频时序来掩盖同步问题。通过所有关卡后,暂停等待用户审核,仅在批准后进行渲染:
t=0hard_stops[].tcompositions/frames/NN-*.htmlbash
( cd "$PROJECT_DIR" && npx hyperframes render . -q draft -o renders/video.mp4 --fps 30 )关卡: //通过;用户已批准;已存在且包含音频,时长等于。最终回复需说明MP4路径和时长。
lintvalidateinspectrenders/video.mp4audiomap.audio.duration_secResume table
恢复进度对照表
| You have | Continue from |
|---|---|
| Step 1 |
| Step 2 |
| Step 3 |
| Step 4 |
| all frame files | Step 5 |
| Step 6 |
| 当前已拥有文件 | 从以下步骤继续 |
|---|---|
仅 | 步骤1 |
| 步骤2 |
| 步骤3 |
| 步骤4 |
| 所有帧文件 | 步骤5 |
| 步骤6 |
Quick Reference
快速参考
Formats: landscape by default; portrait ; square . Set the canvas once in the storyboard frontmatter ().
1920x10801080x19201080x1080canvas: { w, h, fps }Scripts under : (the one analyzer), (plan check), (index assembly), (stage user media), (vendored parser). Everything else is the CLI.
scripts/analyze-beatgrid.pyvalidate-plan.mjsassemble-index.mjsstage-assets.mjslib/storyboard.mjshyperframes| Read | When |
|---|---|
| Step 2: read the music, lay out the frames, set pacing |
| Step 3: pick the brand, fill each frame, write the plan |
| Step 3: pick a template per group |
| Step 3/4: L0 recipes for free-compose |
| Step 3/4: asset treatments (beat-cut / ken-burns) |
| Step 4: dispatch + build one frame |
| Step 4: dispatch sub-agents safely |
| Step 3: pick the preset (the brand) |
画幅格式: 默认横版;竖版;方形。在分镜脚本前置内容中一次性设置画布()。
1920x10801080x19201080x1080canvas: { w, h, fps }scripts/analyze-beatgrid.pyvalidate-plan.mjsassemble-index.mjsstage-assets.mjslib/storyboard.mjshyperframes| 参考文档 | 使用时机 |
|---|---|
| 步骤2:解析音乐、规划帧布局、设置节奏模式 |
| 步骤3:选择品牌风格、填充每一帧、编写计划 |
| 步骤3:为每个分组选择模板 |
| 步骤3/4:自由合成的L0方案 |
| 步骤3/4:素材处理(beat-cut / ken-burns) |
| 步骤4:分配并构建单帧 |
| 步骤4:安全分配子代理 |
| 步骤3:选择预设(品牌风格) |
Directory layout
目录结构
music-to-video/
SKILL.md
references/ frame-skeleton.md · planning.md · storyboard-format.md
template-catalog.md · motion-primitive-catalog.md · montage.md
templates/<id>/ { index.html (+ assets/ · program.json) } ← L1 catalog impls
motion-primitives/<id>/ { index.html } (+ ../assets/gsap.min.js shared by recipes) ← L0 catalog impls
scripts/ analyze-beatgrid.py · assemble-index.mjs · validate-plan.mjs · stage-assets.mjs · lib/storyboard.mjs
sub-agents/ frame-worker.md ← the one subagent (one per frame)music-to-video/
SKILL.md
references/ frame-skeleton.md · planning.md · storyboard-format.md
template-catalog.md · motion-primitive-catalog.md · montage.md
templates/<id>/ { index.html (+ assets/ · program.json) } ← L1模板库实现
motion-primitives/<id>/ { index.html } (+ ../assets/gsap.min.js 所有方案共享) ← L0原语库实现
scripts/ analyze-beatgrid.py · assemble-index.mjs · validate-plan.mjs · stage-assets.mjs · lib/storyboard.mjs
sub-agents/ frame-worker.md ← 唯一子代理(每帧一个)