ugc-ads

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

/pika:ugc-ads

Parameters

参数

Param	Default	Notes
`url`	required	product URL — drives category detection and beat substitution
`avatar_url`	built-in fallback	persona portrait URL; fed as `@Image1` reference. When omitted, the skill uses a pre-generated Pixar-style female creator portrait
`provider`	`seedance`	seedance: strong at UGC selfie / talking-head POV with native lip-sync, multi-segment in single prompt, supports 3:4. kling: explicit `shots[]` , 9:16/16:9 only
`aspect_ratio`	`9:16`	`3:4` is seedance-only (kling rejects 3:4)
`category`	auto	`HAUL` / `APP` / `FOOD` / `BEAUTY` / `FITNESS` / `TECH` ; auto-picked from URL
`captions`	`true`	TikTok-style word-chunked captions burned on top of the final video

参数	默认值	说明
`url`	必填	产品URL — 用于驱动品类检测和节拍内容替换
`avatar_url`	内置 fallback	人物肖像URL；作为 `@Image1` 引用。若未提供，技能将使用预生成的皮克斯风格女性创作者肖像
`provider`	`seedance`	seedance：擅长UGC自拍/POV口播视频，支持原生口型同步，单提示词即可实现多片段，支持3:4比例。kling：需显式定义 `shots[]` ，仅支持9:16/16:9比例
`aspect_ratio`	`9:16`	`3:4` 仅seedance支持（kling不接受该比例）
`category`	auto	`HAUL` / `APP` / `FOOD` / `BEAUTY` / `FITNESS` / `TECH` ；从URL自动选取
`captions`	`true`	在最终视频上添加TikTok风格的逐词分段字幕

Runtime expectations

运行时间预期

Typical end-to-end run: 6–12 minutes. Breakdown:

Step 1 (WebFetch) + Step 3 (capture_website screenshot): ~10–30s
Step 7 (
```
generate_reference_video
```
): ~3–5 min for seedance, ~5–7 min for kling
Step 7b/c (cartoonize + retry): adds ~1–2 min if seedance moderation rejects the avatar
Step 8 (captions): single
```
add_captions
```
call, ~30s–5 min (transcribe + burn in one shot)

If the run exceeds 15 min without progress, something is wrong — inspect the tool-reported generation status and error message.

典型端到端运行时长：6–12分钟。时长分解：

步骤1（WebFetch）+ 步骤3（capture_website 截图）：约10–30秒
步骤7（
```
generate_reference_video
```
）：seedance约3–5分钟，kling约5–7分钟
步骤7b/c（卡通化+重试）：若seedance审核拒绝肖像，额外增加约1–2分钟
步骤8（字幕）：单次
```
add_captions
```
调用，约30秒–5分钟（转录+添加字幕一步完成）

若运行时长超过15分钟仍无进展，则出现异常 — 需检查工具返回的生成状态和错误信息。

Engine choice: Seedance default, Kling fallback

引擎选择：默认Seedance，Kling作为备选

Default to Seedance for UGC selfie/talking-head ads because it handles native lip-sync, single-prompt multi-beat pacing, and optional 3:4 output well. Use Kling when the caller explicitly passes

provider=kling

or when Seedance moderation keeps rejecting the avatar after the cartoonized retry. Kling's tradeoff is stricter aspect-ratio support but a separate moderation path and explicit shot segmentation.

UGC自拍/口播广告默认使用Seedance，因为它能很好地处理原生口型同步、单提示词多节拍节奏，以及可选的3:4输出比例。当调用者显式传入

provider=kling

，或Seedance审核在卡通化重试后仍拒绝肖像时，使用Kling。Kling的 tradeoff 是对宽高比支持更严格，但有独立的审核流程和显式的镜头分段功能。

Steps

步骤

0. Resolve input (empty-args menu)

0. 解析输入（空参数菜单）

Strip flags and

key=value

parameters from

$ARGUMENTS

. If no product URL remains and there is no usable product URL in prior context, print this menu and stop:

Which product should the UGC ad promote? Required:

Product URL — page to fetch for product name, category, visual references, and language
Optional:
avatar_url=
,
provider=seedance|kling
,
aspect_ratio=9:16|3:4
,
category=auto|HAUL|APP|FOOD|BEAUTY|FITNESS|TECH
,
captions=true|false
.

If the product URL is present, skip this step silently.

从

$ARGUMENTS

中去除标志和

key=value

参数。若剩余内容中无产品URL，且上下文无可用产品URL，则显示以下菜单并终止：

该UGC广告应推广哪款产品？ 必填项：

产品URL — 用于获取产品名称、品类、视觉参考和页面语言的页面链接
可选项：
avatar_url=
、
provider=seedance|kling
、
aspect_ratio=9:16|3:4
、
category=auto|HAUL|APP|FOOD|BEAUTY|FITNESS|TECH
、
captions=true|false
。

若存在产品URL，则跳过此步骤。

1. Fetch + categorize

1. 获取内容 + 分类

WebFetch

the URL: pull

product_name

, value prop, brand color, product form, packaging, hero copy, target user, category, and the primary language of the page. Use

category=

if passed; else trust the WebFetch signal; fall back to HAUL for physical, APP for digital.

使用

WebFetch

获取URL内容：提取

product_name

、价值主张、品牌色、产品形态、包装、核心文案、目标用户、品类，以及页面的主要语言。若传入了

category=

参数则使用该值；否则信任WebFetch的检测结果；实物产品默认 fallback 为HAUL，数字产品默认 fallback 为APP。

2. Resolve the avatar (fallback to built-in if missing)

2. 确定肖像（缺失时使用内置 fallback）

If
```
avatar_url
```
was passed → use it as-is.
If NOT passed → use this built-in fallback:
```
https://cdn.pika.art/v2/files/agent/17d62bf9-0edb-49e4-9ba9-2c5419fa518f/seedream-1777624057811.jpeg
```
Pre-generated 3D animated Pixar-style portrait of a young female creator — pre-cartoonized so seedance moderation accepts it directly, neutral enough to fit any category. Note in the final summary that the fallback was used so the caller knows to supply their own portrait for persona consistency next time.

若传入了
```
avatar_url
```
→ 直接使用该URL。
若未传入 → 使用以下内置 fallback：
```
https://cdn.pika.art/v2/files/agent/17d62bf9-0edb-49e4-9ba9-2c5419fa518f/seedream-1777624057811.jpeg
```
预生成的3D动画皮克斯风格年轻女性创作者肖像 — 已预先卡通化，可直接通过seedance审核，风格中性适配所有品类。需在最终总结中说明使用了fallback肖像，以便调用者下次提供自定义肖像以保持人物一致性。

3. Capture the product screenshot (best-effort)

3. 捕获产品截图（尽力而为）

Call

capture_website

with

mode: "screenshot"

. Use

mobile=true

for handheld-product categories (APP / FITNESS / BEAUTY) so the captured page renders as a portrait phone screen;

mobile=false

for desktop-context categories (HAUL / TECH / FOOD).

If the call fails (timeout, browser pool down), retry once. If still failing, proceed without the screenshot — the skill is degraded but functional. The close-up beat then describes the page from prose only and Beat 2's

reference_images

is just

[avatar_url]

Capture URL →

screenshot_url

(or null).

调用

capture_website

并设置

mode: "screenshot"

。对于手持产品品类（APP / FITNESS / BEAUTY），设置

mobile=true

，使捕获的页面以竖屏手机屏幕形式渲染；对于桌面场景品类（HAUL / TECH / FOOD），设置

mobile=false

。

若调用失败（超时、浏览器池不可用），重试一次。若仍失败，则跳过截图继续执行 — 技能功能会降级但仍可使用。此时特写节拍将仅通过文字描述页面内容，且节拍2的

reference_images

仅为

[avatar_url]

。

将捕获的URL保存为

screenshot_url

（或null）。

4. Compose the prompt

4. 编写提示词

The full prompt is a single multi-beat string passed to one

generate_reference_video

call. Structural prose (not markdown bullets). Every beat has a

Says: "..."

line for lip-sync. Pacing target ~5.5–6 words per second across the whole 15-second ad (≈85–90 words total).

@Image1

is the avatar,

@Image2

is the screenshot when available.

Write all
Says: "..."
lines in the language detected from step 1's WebFetch. Both seedance and kling lip-sync handle multilingual; if the product page is Chinese / Japanese / Spanish / etc., the dialogue should be in that language. Hook archetypes from step 5 are language-agnostic — adapt the rhetorical move to the language's natural register.

HOOK (0–3 sec) <visual setting + creator framing + face/body cue>. Says to camera, fast and energetic: "<hook line>". <style anchor — POV handheld, authentic, raw>.

JUMP CUT 1 (3–6 sec) <wide POV — creator's body language, product partially in frame edge>. <face cue>, says fast: "<setup line>".

JUMP CUT 2 (6–9 sec) <next visual beat — could be the screen close-up showing @Image2 OR another reaction beat, depending on which beat the dialogue arc puts the reveal>. Says (or voice continues over the shot if it's a screen close-up), fast and confident: "<reveal line>".

JUMP CUT 3 (9–12 sec) <next visual beat — same logic; one of the JUMP CUTs is the screen close-up, the others are wide-POV reaction shots>. Says, fast: "<insight twist line>".

OUTRO (12–15 sec) <selfie POV, mid-chest framing, same setting>. Says to camera, fast: "<punchline line>".

avatar is image 1, asset is image 2

Screen-close-up beat — exactly one across the ad, position is dialogue-driven:

Place the screen close-up on whichever JUMP CUT (1, 2, or 3) the reveal line lands on. Most ads put it on JUMP CUT 2; if the narrative needs it earlier or later, JUMP CUT 1 or JUMP CUT 3 is fine. Pick by content, not by slot number.
The screen close-up beat shows
```
@Image2
```
exactly as-is and includes ONE finger-point gesture (a single finger entering from the frame edge, pointing at the hero text or product — no tap, no swipe, no scroll, no hover-on-CTA). The point gesture is the only screen interaction in the entire ad.
The other JUMP CUTs are wide-POV reaction beats: hands stay on knees, on the bed, or at sides.

Trust
@Image2
— when the product page is shown, reference the image; do NOT describe its UI in prose. Describing UI triggers the model to invent extra panels / dropdowns / sidebars / animations. Reference the image; trust it.

完整提示词是一个包含多节拍的字符串，将传入单次

generate_reference_video

调用。使用结构化 prose（非markdown列表）。每个节拍均包含

Says: "..."

行以实现口型同步。整个15秒广告的语速目标为约5.5–6词/秒（总词数≈85–90词）。

@Image1

为肖像，

@Image2

为可用时的截图。

所有
Says: "..."
行均使用步骤1中WebFetch检测到的语言编写。seedance和kling均支持多语言口型同步；若产品页面为中文/日文/西班牙文等，对话应使用对应语言。步骤5中的钩子原型与语言无关 — 需根据语言的自然表达习惯调整修辞方式。

HOOK（0–3秒）<视觉场景 + 创作者构图 + 面部/肢体动作>。面向镜头快速且充满活力地说："<钩子台词>"。<风格锚点 — POV手持、真实原生>。

JUMP CUT 1（3–6秒）<宽视角POV — 创作者肢体动作，产品部分出现在画面边缘>。<面部动作>，快速说："<铺垫台词>"。

JUMP CUT 2（6–9秒）<下一个视觉节拍 — 可能是展示@Image2的屏幕特写，或是另一个反应节拍，取决于对话弧中展示环节的位置>。快速且自信地说（若为屏幕特写则旁白继续）："<展示台词>"。

JUMP CUT 3（9–12秒）<下一个视觉节拍 — 逻辑同上；其中一个JUMP CUT为屏幕特写，其余为宽视角反应镜头>。快速说："<转折台词>"。

OUTRO（12–15秒）<自拍POV，胸部中景构图，同一场景>。面向镜头快速说："<点睛台词>"。

avatar是image 1，asset是image 2

屏幕特写节拍 — 广告中仅出现一次，位置由对话决定：

将屏幕特写放置在对应
```
展示
```
台词所在的JUMP CUT（1、2或3）中。大多数广告将其放在JUMP CUT 2；若叙事需要提前或延后，JUMP CUT 1或3也可。根据内容选择，而非固定槽位。
屏幕特写节拍需原样展示
```
@Image2
```
，并包含一次手指指向动作（单根手指从画面边缘进入，指向核心文案或产品 — 无需点击、滑动、滚动或悬停CTA）。该指向动作为整个广告中唯一的屏幕交互。
其他JUMP CUT为宽视角反应节拍：双手放在膝盖、床上或身侧。

信任
@Image2
— 当展示产品页面时，直接引用图片；切勿用文字描述其UI。描述UI会触发模型生成额外面板/下拉菜单/侧边栏/动画。直接引用图片即可。

5. Category essences

5. 品类核心

Each essence is the brief you read before composing the 5 beats. Pick one from category in step 1 and write the actual

Says: "..."

lines tailored to the real product.

每个核心是编写五节拍内容前需遵循的简要规范。根据步骤1中的品类选择一个，并结合实际产品编写真实的

Says: "..."

台词。

HAUL_UNBOX

When to use & why: fashion, handbags, jewelry, shoes, designer drops, streetwear, luxury cosmetics with packaging story, accessories — anything where brand packaging + texture/material is the value prop. Viewers convert on vicarious-unboxing dopamine + "I just got this" social proof; texture and hardware ARE what the customer pays for, so the close-up lands on materials, not function. Not TECH (→ TECH_UNBOX), not skincare/makeup application (→ BEAUTY_APPLY).
Sensory anchors: tissue rustle, fabric slide, hardware clinks (chains / clasps / buckles), leather/fabric grain under fingertips, foil glint.
Setting: white unmade bed in natural window light; bathroom mirror in background for the outro held-up reveal; streetwear drops may use desk/floor.
Close-up beat device: NOT a screen — product close-up.
```
@Image2
```
is a product photo (or brand-site mobile view); the single finger-point lands on a hardware detail (chain, clasp, embossed logo).
Dialogue character: hook is mystery tease — frame the unboxing as something the viewer doesn't yet know the contents of; do NOT name the product in the hook line. Arc: hook the unboxing mystery → brand name + drop context → reveal the material/silhouette while close-up holds on hardware → tactile/wearability insight (how it feels on the body) → punchline that invites the viewer to imagine themselves with the artifact.

适用场景及原因：时尚、手袋、珠宝、鞋履、设计师新品、街头服饰、注重包装故事的奢华化妆品、配饰 — 任何以品牌包装+材质/纹理为价值主张的产品。用户因替代性拆箱的多巴胺快感+“我刚收到这个”的社交认同而转化；材质和五金件是用户付费的核心，因此特写应聚焦材质而非功能。不适用于科技产品（→ TECH_UNBOX）、护肤品/化妆品使用场景（→ BEAUTY_APPLY）。
感官锚点：纸巾摩擦声、布料滑动声、五金件碰撞声（链条/搭扣/带扣）、指尖下的皮革/布料纹理、金属箔反光。
场景：自然光下的未整理白色床铺；结尾手持展示时背景为浴室镜子；街头服饰新品可使用书桌/地面场景。
特写节拍载体：非屏幕 — 产品特写。
```
@Image2
```
为产品照片（或品牌官网移动端视图）；单指指向五金件细节（链条、搭扣、压印logo）。
对话特征：钩子为悬念预告 — 将拆箱包装成观众未知内容的悬念；钩子台词中切勿提及产品名称。叙事弧：拆箱悬念钩子 → 品牌名称+新品背景 → 特写聚焦五金件时展示材质/版型 → 触感/穿戴体验洞察（上身感受） → 邀请观众想象拥有该产品的点睛台词。

APP_REVEAL

When to use & why: SaaS, AI tools, mobile/web apps, agent-style products, dev tools, productivity tools — anything where the screen IS the product. Viewers convert when they see live UI doing the thing in <5 seconds; the close-up beat is the demo, the bookends are the social proof. Not pure hardware (→ TECH_UNBOX).
Sensory anchors: micro-thumb gesture, brand-color highlight, UI alive with small motion, ambient room tone.
Setting: cozy bedroom or couch POV; jeans/joggers at frame edges; warm window light.
Close-up beat device: laptop on bed (desktop screenshot) or phone in hand (mobile screenshot — set
```
mobile=true
```
in step 3).
Dialogue character: hook is bewildered curiosity — the creator can't categorize the thing yet, that's the point. Do NOT use feature lists or marketing language in the hook; lean into "I don't know what to call this" / "this is wild" register that makes the viewer wait for the name. Arc: bewildered hook → name the product + interaction model in human terms ("you just talk to it", "it builds X from Y") → reveal what it produces (concrete comma-separated examples) while close-up shows the page → personal-insight twist (what it replaces / changes in the user's workflow) → punchline + implicit/explicit "go try it" CTA.

适用场景及原因：SaaS、AI工具、移动/网页应用、Agent类产品、开发工具、生产力工具 — 任何以屏幕为核心的产品。用户在5秒内看到实时UI演示时转化；特写节拍为演示环节，首尾为社交认同。不适用于纯硬件（→ TECH_UNBOX）。
感官锚点：微拇指动作、品牌色高亮、带有细微动效的鲜活UI、环境背景音。
场景：舒适的卧室或沙发POV；画面边缘可见牛仔裤/运动裤；温暖的自然光。
特写节拍载体：床上的笔记本电脑（桌面截图）或手持手机（移动端截图 — 步骤3中设置
```
mobile=true
```
）。
对话特征：钩子为困惑好奇 — 创作者无法归类该产品，这正是核心。钩子中切勿使用功能列表或营销话术；采用“我不知道该叫它什么”/“这太疯狂了”的语气，引导观众等待产品名称。叙事弧：困惑钩子 → 用通俗语言命名产品+交互模式（“你只需和它对话”、“它用Y生成X”） → 特写展示页面时揭示其产出（具体的逗号分隔示例） → 个人洞察转折（它替代/改变了用户工作流中的哪些部分） → 点睛台词+隐含/显式“快去试试”的CTA。

FOOD_ASMR

When to use & why: food brands, drinks, kitchen tools, snacks, restaurants with a takeout product — anything where the sensory peak (pour / sizzle / steam / first bite) carries the value prop. Viewers convert on hunger response — show the sensory peak, don't describe it.
Sensory anchors: packaging rustle, knife-on-board, sizzle, pour stream, steam rising, satisfied exhale on the first bite.
Setting: marble counter or warm wood kitchen, top-down framing.
Close-up beat device: a product/dish close-up rather than a screen; phone in hand on the counter only if the brand has a delivery/recipe app.
Dialogue character: hook is show-don't-tell — frame as a demonstration the viewer is watching unfold, not a description. The hook line lands while a hand or first ingredient is already in motion; the visual carries the curiosity. Arc: demonstration hook → name the product + first impression → narrate the sensory peak as it happens (pour / sizzle / steam) → satisfaction insight ("this is the new default") → punchline that hands off the recipe or shop link.

适用场景及原因：食品品牌、饮品、厨房工具、零食、提供外卖产品的餐厅 — 任何以感官峰值（倾倒/ sizzle/蒸汽/第一口）为价值主张的产品。用户因饥饿反应而转化 — 展示感官峰值，而非描述。
感官锚点：包装摩擦声、切菜声、 sizzle声、倾倒流、蒸汽升腾、第一口后的满足呼气声。
场景：大理石台面或暖木厨房，俯视构图。
特写节拍载体：产品/菜品特写而非屏幕；仅当品牌有配送/食谱应用时，画面中可出现放在台面上的手机。
对话特征：钩子为展示而非讲述 — 呈现为观众正在观看的演示，而非描述。钩子台词在手部或第一种食材已开始动作时说出；视觉画面引发好奇心。叙事弧：演示钩子 → 产品名称+第一印象 → 感官峰值发生时旁白描述（倾倒/ sizzle/蒸汽） → 满足感洞察（“这将成为新的标配”） → 提供食谱或店铺链接的点睛台词。

BEAUTY_APPLY

When to use & why: skincare, makeup, cosmetics, fragrance, hair products, body care — anything where before/after + application ritual is the value prop. Viewers convert on visual transformation under matched lighting; symmetry between hook and outro is what sells the result as real. Not packaging-heavy luxury (→ HAUL_UNBOX).
Sensory anchors: pump press, squeeze, glide on skin, glow lift, droplet beading, brush sweep.
Setting: bathroom mirror, natural daylight or vanity lighting; same angle for the hook and the outro.
Close-up beat device: a product close-up (bottle / tube / compact held in hand), not a screen.
Dialogue character: hook is time-stamped social proof — name the duration ("X days in", "morning of week 3", "after one tube") to signal real use rather than paid promo. The hook plants the symmetry payoff that arrives in the after-shot. Arc: timed-claim hook → name + key ingredient or claim → narrate the application as it happens (close-up of fingers/brush on skin) → after-shot reveal (same angle as hook) → punchline that signals exclusivity or repurchase intent.

适用场景及原因：护肤品、彩妆、化妆品、香水、美发产品、身体护理 — 任何以前后对比+使用流程为价值主张的产品。用户在匹配光线下看到视觉转变时转化；钩子与结尾的对称性是让结果显得真实的关键。不适用于注重包装的奢华产品（→ HAUL_UNBOX）。
感官锚点：按压泵头、挤压、涂抹在皮肤上的顺滑感、光泽提升、水滴凝结、刷子扫动。
场景：浴室镜子前，自然光或化妆台灯光；钩子和结尾使用相同角度。
特写节拍载体：产品特写（手持瓶/管/粉盒），非屏幕。
对话特征：钩子为带时间戳的社交认同 — 提及使用时长（“使用X天后”、“第三周的早晨”、“用完一支后”），以表明真实使用而非付费推广。钩子为结尾的对称回报埋下伏笔。叙事弧：带时间戳的钩子 → 产品名称+核心成分或宣称 → 涂抹过程中旁白描述（特写手指/刷子在皮肤上的动作） → 展示对比效果（与钩子相同角度） → 暗示独家性或回购意愿的点睛台词。

FITNESS_TRANSFORM

When to use & why: workout equipment, supplements, recovery tools, activewear, fitness apps with tracking — anything where the work-to-result transformation is the value prop. Viewers convert on relatable struggle followed by earned payoff — showing the protein-shake bottle is not enough, you have to show the workout.
Sensory anchors: heavy breathing, scoop hitting powder, equipment click, sweat catching light, post-workout exhale.
Setting: gym or home-gym; workout gear at frame edges; floor or bench level.
Close-up beat device: phone in hand showing app stats / heart rate / time elapsed, OR product packaging close-up (scoop in jar, bottle pour).
Dialogue character: hook is relatable resistance — name the struggle / friction / not-wanting-to ("I did NOT want to do this", "almost skipped today", "this was supposed to be a rest day"); earns trust by sharing the tired feeling before showing the work. Arc: resistance hook → name the product + protocol ("I'm on day X of Y") → narrate mid-work moment while close-up shows the device or scoop → satisfaction insight that earns trust → punchline that frames continued use.

适用场景及原因：健身器材、补剂、恢复工具、运动服饰、带追踪功能的健身应用 — 任何以付出-结果转变为价值主张的产品。用户因感同身受的挣扎后获得回报而转化 — 仅展示蛋白粉瓶不够，需展示锻炼过程。
感官锚点：沉重呼吸声、勺子舀粉声、器材卡扣声、汗水反光、锻炼后的呼气声。
场景：健身房或家庭健身房；画面边缘可见健身装备；地面或板凳视角。
特写节拍载体：手持手机展示应用统计数据/心率/时长，或产品包装特写（勺子舀粉、倾倒瓶中液体）。
对话特征：钩子为感同身受的抗拒 — 提及挣扎/摩擦/不想做的情绪（“我真的不想做这个”、“差点跳过今天”、“今天本该是休息日”）；通过展示疲惫感受建立信任，再展示锻炼过程。叙事弧：抗拒钩子 → 产品名称+使用方案（“我正在进行Y的第X天”） → 锻炼中途旁白描述，同时特写展示设备或勺子 → 建立信任的满足感洞察 → 表明持续使用意愿的点睛台词。

TECH_UNBOX

When to use & why: gadgets, hardware, electronics, smart-home devices, wearables, peripherals, AI hardware (Framework laptop, AirPods, Whoop, Rabbit r1, Friend pendant, mechanical keyboards, ergonomic gear) — anything where the device + first-use moment is the value prop. The box ceremony signals premium positioning; viewers convert on seeing "does it actually work / what does it do" — the first-use beat is the conversion moment. Not HAUL (→ HAUL_UNBOX), not pure software/SaaS (→ APP_REVEAL).
Sensory anchors: utility-knife slice, plastic peel, foam slide-out, power-on chime, tactile button press, haptic click, fan spin-up.
Setting: wood desk, top-down framing during unbox; handheld during first-use; desk/lap context for ongoing use.
Close-up beat device: the device itself once unboxed and powered on.
```
@Image2
```
is typically a real photo of the device's screen at its key UI moment (first measurement, paired status, hero feature open); if the device has no screen, a clean hero photo of it mid-use.
Dialogue character: hook is arrival ceremony — name that this is happening now ("just got this", "opening it"). Anticipation > description; the hook plants the question "what does it do?" that the first-use beat answers. Do NOT lead with specs. Arc: arrival hook → name + one-line spec headline → first-use reveal while close-up is on the device doing its thing → workflow-change insight ("this replaces / changes / fixes my X") → punchline that hands off urgency (price, where to find, time-limited).

适用场景及原因： gadgets、硬件、电子产品、智能家居设备、可穿戴设备、外设、AI硬件（Framework笔记本、AirPods、Whoop、Rabbit r1、Friend吊坠、机械键盘、人体工学装备） — 任何以设备+首次使用时刻为价值主张的产品。拆箱仪式彰显高端定位；用户因看到“它真的能用吗/它能做什么”而转化 — 首次使用节拍是转化时刻。不适用于HAUL类产品（→ HAUL_UNBOX）、纯软件/SaaS（→ APP_REVEAL）。
感官锚点：美工刀切割声、塑料剥离声、泡沫滑出声、开机提示音、按键触感、触觉反馈、风扇转动声。
场景：木质书桌，拆箱时俯视构图；首次使用时手持；后续使用时为书桌/大腿场景。
特写节拍载体：拆箱并开机后的设备本身。
```
@Image2
```
通常为设备关键UI时刻的真实照片（首次测量、配对状态、核心功能打开）；若设备无屏幕，则为使用中的清晰产品主图。
对话特征：钩子为到货仪式 — 表明这是正在发生的事情（“刚收到这个”、“正在开箱”）。期待感>描述；钩子埋下“它能做什么？”的疑问，由首次使用节拍解答。切勿以参数开头。叙事弧：到货钩子 → 产品名称+一行参数标题 → 特写展示设备运行时的首次使用环节 → 工作流改变洞察（“它替代/改变/解决了我的X问题”） → 传递紧迫感（价格、购买渠道、限时优惠）的点睛台词。

6. Voice input — fetch the user's voice sample (best-effort)

6. 语音输入 — 获取用户语音样本（尽力而为）

Right before the generate call, fetch the user's voice sample URL: call

identity_voice_sample_url

. This returns a short-lived download URL (mp3/wav) backing the user's registered voice, OR

null

if no voice is on file.

If non-null → capture as
```
voice_sample_url
```
and pass it on the next call's
```
reference_audio
```
array. Both seedance and kling accept
```
reference_audio
```
(seedance up to 3, ≤15s combined; kling up to 8). The model uses the sample to clone the speaker's timbre for the lip-sync.
If null → skip; the model uses its default voice.

Always get this URL fresh right before step 7 — do NOT cache or reuse a stale URL across runs.

在生成调用前，获取用户语音样本URL：调用

identity_voice_sample_url

。该接口返回用户注册语音的短期有效下载URL（mp3/wav），若无语音则返回

null

。

若返回非null → 保存为
```
voice_sample_url
```
，并传入下一次调用的
```
reference_audio
```
数组。seedance和kling均支持
```
reference_audio
```
（seedance最多3个，总时长≤15秒；kling最多8个）。模型将使用该样本克隆说话者音色以实现口型同步。
若返回null → 跳过；模型使用默认语音。

务必在步骤7前重新获取该URL — 切勿缓存或跨运行复用过期URL。

7. Generate — first attempt with the avatar, cartoonize on rejection, retry

7. 生成 — 首次使用肖像尝试，拒绝后卡通化并重试

Always attempt the call first with the avatar resolved in step 2 (caller-supplied or built-in fallback) exactly as-is. The skill does not pre-process or pre-judge it. Only when seedance rejects the call do we restyle.

7a. First attempt — avatar as-is

Call

generate_reference_video

```
provider
```
:
```
seedance
```
(default) or
```
kling
```
if user passed
```
provider=kling
```
```
aspect_ratio
```
:
```
9:16
```
(default);
```
3:4
```
allowed only on seedance
```
resolution
```
:
```
720p
```
(seedance only)
```
duration
```
: 15

reference_images

[avatar_url, screenshot_url]

(drop

screenshot_url

if step 3 failed)

```
reference_audio
```
:
```
[voice_sample_url]
```
(omit the param entirely if step 6 returned null)
```
prompt
```
: the multi-beat string from step 4
```
sound
```
: true (default — ambient + lip-sync produced by the model)

For

provider=kling

: convert the multi-beat prose into

shots: [{prompt, duration}, ...]

(5 shots × 3s = 15s sum), plus a top-level

prompt

summarizing the ad. References use

<<<image_1>>>

<<<image_2>>>

instead of

@Image1

@Image2

If the call returns

{ task_id, status: "queued" }

, poll

task_status(task_id)

in a tight loop (no Bash, no sleep) until terminal (

completed | failed | cancelled

). On

completed

, capture

result.url

→

video_url

and proceed to step 8.

7b. On rejection — auto-cartoonize the avatar

If 7a returns

422 content_policy_violation

image_urls

reference_images

(seedance + fal-queue moderation flags portraits that read as too photorealistic — even some Pixar-style 3D avatars get flagged), restyle the avatar in-place:

Call

generate_image

```
provider: "seedream"
```
(native Pixar/3D-animated look)
```
reference_images: [avatar_url]
```
(the new plural form;
```
reference_image: <url>
```
is still accepted as a deprecated single-image alias for back-compat — see [pika-mcp-server BACK-339, 2026-05-10])
```
aspect_ratio
```
: same as the ad's aspect ratio
```
resolution: "1K"
```
```
watermark: false
```
(seedream-only knob added by BACK-339 — keep the restyled avatar clean of provider watermark for the downstream lip-sync re-render)

prompt: "Stylized 3D game character render — Unreal Engine 5 / Overwatch / Valorant / Apex Legends visual style. Anatomically grounded facial proportions with subtle stylization: slightly larger expressive eyes, defined sculpted cheekbone planes, smooth skin shader (smoother than photoreal, no micropore detail), idealized but believable features. PBR materials with subtle subsurface scattering, strand-based hair simulation, crisp cloth shader. Cinematic three-point studio lighting with strong rim light. Clearly a stylized AAA-game-character render — NOT photorealistic person, NOT Pixar plastic-toy cartoon, NOT exaggerated big-head proportions. Same person, same glasses, same outfit, same accessories. Centered medium portrait, neutral indoor background."

Capture returned URL →

avatar_url_cartoon

7c. Retry seedance with the cartoonized avatar

Re-run the exact same

generate_reference_video

call from 7a, swapping the avatar reference:

reference_images: [avatar_url_cartoon, screenshot_url]

(or

[avatar_url_cartoon]

if step 3 failed). All other params unchanged. Capture

result.url

→

video_url

7d. Final fallback — still rejected

If 7c also returns

content_policy_violation

, stop. Tell the user: the avatar reads as too realistic for seedance moderation even after auto-restyling; ask them to either supply a more stylized portrait themselves or rerun with

provider=kling

(kling has a separate moderation pipeline that accepts realistic avatars).

始终先使用步骤2中确定的肖像（调用者提供或内置fallback）原样尝试调用。技能不会预先处理或判断肖像。仅当seedance拒绝调用时才重新调整风格。

7a. 首次尝试 — 肖像原样使用

调用

generate_reference_video

：

```
provider
```
:
```
seedance
```
（默认）；若用户传入
```
provider=kling
```
则使用kling
```
aspect_ratio
```
:
```
9:16
```
（默认）；仅seedance允许使用
```
3:4
```
```
resolution
```
:
```
720p
```
（仅seedance支持）
```
duration
```
: 15

reference_images

[avatar_url, screenshot_url]

（若步骤3失败则移除

screenshot_url

）

```
reference_audio
```
:
```
[voice_sample_url]
```
（若步骤6返回null则完全省略该参数）
```
prompt
```
: 步骤4中的多节拍字符串
```
sound
```
: true（默认 — 模型生成环境音+口型同步音频）

对于

provider=kling

：将多节拍prose转换为

shots: [{prompt, duration}, ...]

（5个镜头×3秒=总时长15秒），并添加一个总结广告的顶级

prompt

。引用使用

<<<image_1>>>

<<<image_2>>>

替代

@Image1

@Image2

。

若调用返回

{ task_id, status: "queued" }

，则循环调用

task_status(task_id)

（无需Bash或sleep）直到状态变为终态（

completed | failed | cancelled

）。若状态为

completed

，则保存

result.url

为

video_url

并进入步骤8。

7b. 拒绝时 — 自动卡通化肖像

若7a返回

422 content_policy_violation

且涉及

image_urls

reference_images

（seedance + fal-queue审核会标记过于写实的肖像 — 即使部分皮克斯风格3D肖像也会被标记），则原地重新调整肖像风格：

调用

generate_image

：

```
provider: "seedream"
```
（原生皮克斯/3D动画风格）
```
reference_images: [avatar_url]
```
（新的复数形式；
```
reference_image: <url>
```
仍作为兼容旧版本的单图别名被接受 — 参见[pika-mcp-server BACK-339, 2026-05-10]）
```
aspect_ratio
```
: 与广告的宽高比一致
```
resolution: "1K"
```
```
watermark: false
```
（BACK-339新增的seedream专属参数 — 保持重新调整后的肖像无平台水印，以便后续口型同步重新渲染）

prompt: "Stylized 3D game character render — Unreal Engine 5 / Overwatch / Valorant / Apex Legends visual style. Anatomically grounded facial proportions with subtle stylization: slightly larger expressive eyes, defined sculpted cheekbone planes, smooth skin shader (smoother than photoreal, no micropore detail), idealized but believable features. PBR materials with subtle subsurface scattering, strand-based hair simulation, crisp cloth shader. Cinematic three-point studio lighting with strong rim light. Clearly a stylized AAA-game-character render — NOT photorealistic person, NOT Pixar plastic-toy cartoon, NOT exaggerated big-head proportions. Same person, same glasses, same outfit, same accessories. Centered medium portrait, neutral indoor background."

保存返回的URL为

avatar_url_cartoon

。

7c. 使用卡通化肖像重试seedance

重新执行7a中的

generate_reference_video

调用，仅替换肖像引用：

reference_images: [avatar_url_cartoon, screenshot_url]

（若步骤3失败则为

[avatar_url_cartoon]

）。其他参数保持不变。保存

result.url

为

video_url

。

7d. 最终备选 — 仍被拒绝

若7c也返回

content_policy_violation

，则终止执行。告知用户：即使自动重新调整风格后，肖像仍因过于写实无法通过seedance审核；请用户提供风格更夸张的肖像，或使用

provider=kling

重新运行（kling有独立的审核流程，接受写实肖像）。

8. Captions — single-shot styled burn (default on)

8. 字幕 — 单次添加风格化内嵌字幕（默认开启）

Skip if

captions=false

. Use one

add_captions

call instead of chaining

edit_text_overlay

per chunk — much faster (≤5 min single call vs 5–8 min sequential), and the styles position captions correctly out of the box.

Call

add_captions

```
video_url
```
:
```
video_url
```
from step 7
```
style
```
:
```
"tiktok"
```
(default — word-by-word purple highlight, Bebas Neue, all caps, rendered at the bottom of the frame; classic TikTok-creator look that keeps the face and screen clear). Alternatives:
```
"hormozi"
```
(lower-middle yellow highlight, more aggressive — overlays part of the phone-in-hand close-up beat),
```
"classic"
```
(plain bottom subtitle bar, safest),
```
"karaoke"
```
(progressive color fill, also bottom).
```
font_size
```
:
```
60
```
— overrides the per-style default; tuned for 9:16 readability without dominating the frame.
```
language
```
: pass the BCP-47 code for the page language detected in step 1 (
```
"en"
```
,
```
"zh"
```
,
```
"ja"
```
,
```
"es"
```
, etc.) — skips auto-detect and avoids misrouting CJK to a Latin-only font path.

Capture the returned URL →

final_url

若

captions=false

则跳过。使用单次

add_captions

调用而非逐段链式调用

edit_text_overlay

— 速度更快（单次调用≤5分钟 vs 链式调用5–8分钟），且默认样式可正确定位字幕，避免遮挡画面。

调用

add_captions

：

```
video_url
```
: 步骤7中的
```
video_url
```
```
style
```
:
```
"tiktok"
```
（默认 — 逐词紫色高亮，Bebas Neue字体，全大写，渲染在画面底部；经典TikTok创作者风格，不遮挡面部和屏幕）。可选样式：
```
"hormozi"
```
（中下黄色高亮，风格更激进 — 会遮挡手持手机的特写节拍部分画面）、
```
"classic"
```
（底部纯字幕栏，最安全）、
```
"karaoke"
```
（渐进式颜色填充，同样在底部）。
```
font_size
```
:
```
60
```
— 覆盖样式默认值；针对9:16比例优化，确保可读性且不占据过多画面。
```
language
```
: 传入步骤1中检测到的页面语言BCP-47代码（
```
"en"
```
、
```
"zh"
```
、
```
"ja"
```
、
```
"es"
```
等） — 跳过自动检测，避免将CJK语言路由到仅支持拉丁字母的字体路径。

保存返回的URL为

final_url

。

9. Return

9. 返回结果

Return

final_url

on one line, plus a one-line summary: which category ran, whether the avatar was caller-supplied / built-in fallback / cartoonize-recovered, whether the screenshot was used or fell back to prose, whether the user's voice sample was used or default, the provider chosen, the language detected for dialogue, and whether captions were burned on.

单行返回

final_url

，并附加一行总结：运行的品类、肖像来源（调用者提供/内置fallback/卡通化恢复）、是否使用截图（或 fallback 到文字描述）、是否使用用户语音样本（或默认语音）、选择的provider、对话使用的检测语言、是否添加了字幕。

Load-bearing phrases

关键承载短语

These anchors keep the ad from drifting into a generic product demo:

Phrase	Where	Why load-bearing
`HOOK + 3 JUMP CUTs + OUTRO`	Prompt skeleton	Forces the TikTok-style multi-cut rhythm instead of one continuous presenter shot.
`Every beat has a Says: "..." line`	Prompt skeleton	Gives the video engine explicit lip-sync material across all beats.
`Trust @Image2`	Screen close-up rule	Prevents invented product UI when a real screenshot is already supplied.
`exactly one` screen-close-up beat	Prompt composition	Keeps the ad from becoming a screen recording instead of a creator-style reveal.
`Write all Says lines in the language detected from step 1`	Dialogue rule	Keeps localized product pages from getting English dialogue by default.
`single add_captions call`	Caption step	Avoids quality loss and drift from chained text overlays.

这些锚点可防止广告偏离为通用产品演示：

短语	位置	承载原因
`HOOK + 3 JUMP CUTs + OUTRO`	提示词框架	强制采用TikTok风格的多剪辑节奏，而非单一连续的 presenter 镜头。
`Every beat has a Says: "..." line`	提示词框架	为视频引擎提供所有节拍的明确口型同步素材。
`Trust @Image2`	屏幕特写规则	当已有真实截图时，防止模型生成虚构的产品UI。
`exactly one` screen-close-up beat	提示词编写	避免广告变成屏幕录制，保持创作者风格的展示形式。
`Write all Says lines in the language detected from step 1`	对话规则	避免本地化产品页面默认生成英文对话。
`single add_captions call`	字幕步骤	避免链式文本叠加导致的画质损失和偏差。

Examples

示例

/pika:ugc-ads https://pika.me avatar_url=https://cdn/face.png

→ APP_REVEAL, 9:16, seedance, real screenshot, captions on

/pika:ugc-ads https://maisonbrune.com avatar_url=https://cdn/face.png aspect_ratio=3:4

→ HAUL_UNBOX, 3:4, seedance

/pika:ugc-ads https://pika.me avatar_url=https://cdn/face.png provider=kling captions=false

→ APP_REVEAL, 9:16, kling shots[], no captions

```
/pika:ugc-ads https://pika.me
```
→ no
```
avatar_url
```
→ uses the built-in fallback Pixar-style female creator portrait, runs end-to-end

```
/pika:ugc-ads https://pika.me avatar_url=https://cdn/face.png
```
→ APP_REVEAL，9:16，seedance，真实截图，字幕开启

/pika:ugc-ads https://maisonbrune.com avatar_url=https://cdn/face.png aspect_ratio=3:4

→ HAUL_UNBOX，3:4，seedance

/pika:ugc-ads https://pika.me avatar_url=https://cdn/face.png provider=kling captions=false

→ APP_REVEAL，9:16，kling shots[]，字幕关闭

```
/pika:ugc-ads https://pika.me
```
→ 无
```
avatar_url
```
→ 使用内置fallback皮克斯风格女性创作者肖像，全程运行",