gen-ai-persona-creation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Influencer Persona

AI网红人设

Turn one sentence into a head-to-toe 4-angle casting card in signature wardrobe, persona profile, platform-tuned captions, and (optional) a reel with ambient audio. Output:
./<persona-slug>/
.
将一句话转化为包含专属穿搭的全身4角度选角卡、人设档案、适配平台的文案,以及(可选)带环境音的短视频。输出路径:
./<persona-slug>/

When to Use

适用场景

See the description above.
参见上方描述。

Prerequisites

前置条件

bash
gen-ai whoami            # auth + gen-ai install + Node v22+ check
command -v curl          # ships with macOS / Linux / Git-Bash
If
gen-ai whoami
fails:
gen-ai login
or set
PICSART_ACCESS_TOKEN
+
PICSART_USER_ID
. No extra media tools needed.
bash
gen-ai whoami            # 认证 + gen-ai 安装 + Node v22+ 版本检查
command -v curl          # macOS / Linux / Git-Bash 自带
gen-ai whoami
执行失败:运行
gen-ai login
或设置
PICSART_ACCESS_TOKEN
+
PICSART_USER_ID
。无需额外媒体工具。

How to Run

运行方式

Use the agent's
terminal
tool to invoke
gen-ai
commands as described in the Procedure below.
使用Agent的
terminal
工具,按照下方流程调用
gen-ai
命令。

Quick Reference

快速参考

See the Procedure for canonical commands.
参见流程中的标准命令。

Procedure

流程

See sections below for the detailed walkthrough.
参见下方详细步骤说明。

Pitfalls

常见问题

See Common Pitfalls below.
参见下方常见问题部分。

Verification

验证方法

Run
gen-ai whoami
to confirm authentication, then re-run the failed command with
--debug
.
运行
gen-ai whoami
确认认证状态,然后添加
--debug
参数重新执行失败的命令。

How the skill calls
gen-ai

技能调用
gen-ai
的方式

bash
URL=$(gen-ai generate -m <model> -p "<prompt>" --json --no-input | grep -oE 'https?://[^"]+' | head -1)
curl -sSL -o ./<persona-slug>/<file>.<ext> "$URL"
--download
doesn't work with
--json --no-input
— URL+curl is canonical.
Bash footguns: never add
2>&1
or stderr redirects between
--json --no-input
and the closing
)
— shell parse error before the command runs (verified). Keep the inner pipe strictly
--json --no-input | grep -oE 'https?://[^"]+' | head -1
. One generation per
URL=$(...)
.
bash
URL=$(gen-ai generate -m <model> -p "<prompt>" --json --no-input | grep -oE 'https?://[^"]+' | head -1)
curl -sSL -o ./<persona-slug>/<file>.<ext> "$URL"
--download
参数无法与
--json --no-input
同时使用——通过URL+curl是标准方式。
Bash陷阱:
--json --no-input
和闭合
)
之间绝对不要添加
2>&1
或标准错误重定向,否则命令执行前会出现Shell解析错误(已验证)。内部管道必须严格保持为
--json --no-input | grep -oE 'https?://[^"]+' | head -1
。每个
URL=$(...)
对应一次生成操作。

Style routing

风格路由

StyleModelForCost
realistic
(default)
gemini-3.1-flash-image
photoreal humans + photoreal common pets / anthropomorphic animals~3 cr
stylized
grok-imagine
anime, 3D-animated fruit/object/character, illustration~1 cr
Cross-provider fallback: primary fails → retry with
flux-2-max
(~3 cr, supports
imageUrls
). Both fail → surface error.
风格模型适用场景成本
realistic
(默认)
gemini-3.1-flash-image
写实人类 + 写实常见宠物/拟人化动物~3 cr
stylized
grok-imagine
动漫、3D动画化水果/物品/角色、插画~1 cr
跨供应商降级方案: 主模型失败→重试使用
flux-2-max
(~3 cr,支持
imageUrls
)。若两者均失败→显示错误信息。

Style inference (read the brief)

风格推断(阅读需求描述)

Brief containsStyle → opening
Fruit/veggie/food + "character" / "anthropomorphic" / "brainrot"stylized → fruit/object
Animal name + "pet" / "influencer" / "creator" / breed (NOT "real form" / "four-legged")realistic → anthropomorphic humanoid pet (default for animal/pet briefs — fluffy biped in cute clothes, matches project's
pets
category vibe)
Animal name + explicit "real form" / "four-legged" / "on all fours" / "real cat / dog / animal"realistic → real-form quadruped pet (opt-in)
"Anime", "manga", "magical girl", "kawaii", "shoujo", "shonen"stylized → anime
"3D rendered", "stylized 3D", "claymation", "feature-film animation"stylized → 3D character
"Illustrated", "painted", "watercolor", "comic book"stylized → illustration
Human profession + demographic, no style cuerealistic → photoreal human
Both anthropomorphic-humanoid and real-form-quadruped are supported, but anthropomorphic is the default for pet briefs — that matches the Picsart project's
pets
category which is fluffy biped influencers in cute clothes (tiny sweaters, mini hoodies, bow ties), with food-themed names (Biscuit, Mochi, Nugget, Bean, Waffles, Tofu, Pickle) and gen-z bios ("professional napper | treat negotiator | certified good boy/girl"). Real-form four-legged is the opt-in for creators who explicitly say so. Style conflict (e.g. "anime fitness coach") → prefer the stylistic cue.
Most creators want stylized. Don't blindly default to realistic.
IP-safe wording (mandatory): never name studios / franchises in prompts sent to the model — no "Pixar", "Disney", "Toy Story", "Studio Ghibli", "Marvel", etc. Recognize creator phrasing like "Pixar-style" as a 3D-animated intent (route to stylized 3D) but use generic descriptors in the actual prompt: "3D-animated", "feature-film animation aesthetic", "stylized 3D rendering", "anime cel-shaded illustration". Studio names trigger content policies + downstream IP risk.
需求描述包含内容风格→初始设定
水果/蔬菜/食物 + "character" / "anthropomorphic" / "brainrot"stylized → 水果/物品角色
动物名称 + "pet" / "influencer" / "creator" / 品种(不含"real form" / "four-legged")realistic → 拟人化人形宠物(宠物类需求默认:穿着可爱衣服的毛茸茸双足角色,匹配项目
pets
分类风格)
动物名称 + 明确的"real form" / "four-legged" / "on all fours" / "real cat / dog / animal"realistic → 真实形态四足宠物(需主动选择)
"Anime", "manga", "magical girl", "kawaii", "shoujo", "shonen"stylized → 动漫风格
"3D rendered", "stylized 3D", "claymation", "feature-film animation"stylized → 3D角色
"Illustrated", "painted", "watercolor", "comic book"stylized → 插画风格
人类职业 + 人口特征,无风格提示realistic → 写实人类
拟人化人形和真实形态四足宠物均支持,但宠物类需求默认采用拟人化风格——这与Picsart项目的
pets
分类一致:穿着可爱衣服(小毛衣、迷你卫衣、领结)的毛茸茸双足网红,食物系名字(Biscuit、Mochi、Nugget、Bean、Waffles、Tofu、Pickle等),Z世代风格简介("professional napper | treat negotiator | certified good boy/girl")。真实形态四足仅适用于明确指定的创作者。风格冲突(例如"anime fitness coach")→优先采用风格化提示。
大多数创作者偏好风格化。 不要盲目默认写实风格。
IP安全措辞(强制要求): 发送给模型的提示中绝对不要提及工作室/IP名称——禁止使用"Pixar"、"Disney"、"Toy Story"、"Studio Ghibli"、"Marvel"等。将创作者的"Pixar-style"这类表述理解为3D动画的意图(路由至风格化3D),但在实际提示中使用通用描述:"3D-animated"、"feature-film animation aesthetic"、"stylized 3D rendering"、"anime cel-shaded illustration"。工作室名称会触发内容政策及后续IP风险。

What creators express in their brief (natural language)

创作者在需求描述中常用的自然语言表达

The agent extracts intent — no CLI flags to learn:
  • Reference image ("from /path/photo.png") → adds
    -i
    to casting-card call
  • Reel ("add a tiktok reel", "with motion") → triggers Step 4 (~11 extra cr)
  • Platform ("for tiktok", "instagram reel", "linkedin") → drives reel AR + caption tuning
  • Style ("anime", "3D", "painted", "photoreal") → routes realistic / stylized
  • Name ("named Nova") → sets persona name
  • Character type ("strawberry character", "golden retriever pet", "magical girl") → picks subject opening
Agent会提取意图——无需学习CLI参数:
  • 参考图片"from /path/photo.png")→ 在选角卡调用中添加
    -i
    参数
  • 短视频"add a tiktok reel", "with motion")→ 触发步骤4(额外约11 cr)
  • 平台"for tiktok", "instagram reel", "linkedin")→ 驱动短视频AR格式和文案适配
  • 风格"anime", "3D", "painted", "photoreal")→ 路由至写实/风格化
  • 名称"named Nova")→ 设置人设名称
  • 角色类型"strawberry character", "golden retriever pet", "magical girl")→ 选择主题初始设定

Quick start

快速开始

Plain English. Examples:
  • "Create a persona for: fitness coach, gen-z, neon vibe" (realistic human)
  • "Create a fluffy golden puppy pet influencer, sassy queen energy, mini hoodie" (anthropomorphic pet — DEFAULT for pet briefs: fluffy biped in cute clothes)
  • "Create a calico kitten content creator, sleepy baby vibe, tiny knitted sweater" (anthropomorphic pet)
  • "Create a real four-legged tortoiseshell cat in a sunlit Tokyo apartment" (real-form pet — opt-in only with explicit "real form / four-legged" cue)
  • "Make me an anime magical-girl librarian" (stylized)
  • "Create a strawberry character, brainrot 3D-animated vibe" (stylized fruit)
  • "Create a persona based on /path/photo.png — indie folk musician" (reference)
  • "Create a persona for: fitness coach — and add a tiktok reel" (with reel)
Output:
casting.png
,
persona.md
,
_meta.json
(+
reel-hero.png
+
reel.mp4
if reel requested).
Cost: ~3 cr lean / ~14 cr with reel.
使用简单英文即可。示例:
  • "Create a persona for: fitness coach, gen-z, neon vibe"(写实人类)
  • "Create a fluffy golden puppy pet influencer, sassy queen energy, mini hoodie"(拟人化宠物——宠物需求默认:穿着可爱衣服的毛茸茸双足角色)
  • "Create a calico kitten content creator, sleepy baby vibe, tiny knitted sweater"(拟人化宠物)
  • "Create a real four-legged tortoiseshell cat in a sunlit Tokyo apartment"(真实形态宠物——仅在明确指定"real form / four-legged"时启用)
  • "Make me an anime magical-girl librarian"(风格化)
  • "Create a strawberry character, brainrot 3D-animated vibe"(风格化水果角色)
  • "Create a persona based on /path/photo.png — indie folk musician"(带参考图)
  • "Create a persona for: fitness coach — and add a tiktok reel"(带短视频)
输出文件:
casting.png
persona.md
_meta.json
(若请求短视频则额外包含
reel-hero.png
+
reel.mp4
)。
成本:约3 cr(基础版)/ 约14 cr(含短视频)。

Pipeline

流程

Step 1 — Intent

步骤1 — 意图提取

Extract: persona seed | style | reference image | reel + platform requested | name | slug.
Bias hard toward "infer and proceed." Only ask if brief is truly thin (1–2 words). Invent missing details (gender, age, ethnicity, vibe), note in
persona.md
, let creator re-roll.
If you must ask, ask exactly ONE direct question. Never enumerate A/B/C/D menus. Never stack multiple questions.
GOOD response (only when brief too thin):
Give me a one-liner — vibe / type / niche.
Examples:
- anthropomorphic pet (default for pet briefs): "fluffy golden puppy influencer, sassy queen, mini hoodie" / "calico kitten creator, sleepy baby, tiny sweater"
- real-form pet (opt-in): "real four-legged tortie cat in a sunlit apartment"
- realistic human: "Berlin art curator, dark academia, mid-thirties"
- stylized: "anime magical-girl librarian" / "anthropomorphic strawberry, brainrot 3D"
Add-ons: "from /path/photo.png" / "add a tiktok reel" / "named Mochi"
BAD: A/B/C/D menus + multiple questions stacked. Don't.
提取:人设核心信息 | 风格 | 参考图片 | 是否请求短视频及平台 | 名称 | 标识slug。
强烈倾向于“推断并执行”。 仅当需求描述过于简略(1-2个词)时才询问。补充缺失细节(性别、年龄、种族、风格),记录在
persona.md
中,允许创作者重新生成。
若必须询问,仅问一个直接问题。绝不要列出A/B/C/D选项菜单,绝不要堆叠多个问题。
正确回应(仅当需求过简时):
请提供一句话描述——风格/类型/细分领域。
示例:
- 拟人化宠物(宠物需求默认):"fluffy golden puppy influencer, sassy queen, mini hoodie" / "calico kitten creator, sleepy baby, tiny sweater"
- 真实形态宠物(需主动选择):"real four-legged tortie cat in a sunlit apartment"
- 写实人类:"Berlin art curator, dark academia, mid-thirties"
- 风格化:"anime magical-girl librarian" / "anthropomorphic strawberry, brainrot 3D"
附加选项:"from /path/photo.png" / "add a tiktok reel" / "named Mochi"
错误做法:A/B/C/D选项菜单+多个问题堆叠。请勿如此操作。

Step 2 — Identity →
persona.md

步骤2 — 身份设定 →
persona.md

Write: name | bio (2–3 sentences) | voice/tone | frozen appearance block (verbatim, reuse in every prompt). Block contains identity DNA only (face geometry, eye/hair/skin, body type, distinguishing marks, wardrobe aesthetic baseline) — NOT per-shot deltas (expression, pose, lighting, scene, specific outfits).
撰写:名称 | 简介(2-3句话) | 语气风格 | 固定外观模块(逐字复用,用于所有提示)。模块仅包含身份核心信息(面部轮廓、眼/发/肤色、体型、标志性特征、穿搭风格基准)——不包含单镜头差异(表情、姿势、光线、场景、特定服装)。

Step 3 — Casting card

步骤3 — 选角卡

One call. 4 head-to-toe angles, plain seamless gray, signature wardrobe, neutral expression. 9:16 portrait with 2×2 grid inside (each panel ≈ 9:16 — fits full body).
bash
URL=$(gen-ai generate -m <style-model> -p "<subject-opening> The image shows the same exact character from four camera angles in a 2x2 portrait grid (9:16 canvas). ALL FOUR PANELS share: identical plain seamless studio gray background — flat uniform fill, no gradient/texture/scene. Identical signature wardrobe — same complete outfit head to feet (or, for common pets, identical simple accessories like collar/bandana/sweater — never humanoid clothing). Identical neutral expression — relaxed mouth. Identical even soft frontal softbox key + subtle fill + soft ground shadow, no rim lights, no colored gels. Identical hair/fur. Same identity in every panel: <frozen appearance block>. Differs only in angle: TOP-LEFT front-facing full body eyes at camera; TOP-RIGHT 3/4 facing camera-right; BOTTOM-LEFT full left profile looking off-left; BOTTOM-RIGHT 3/4 from behind over-the-shoulder. Magazine fashion model sheet composition, thin clean grid lines. The four panels MUST look like consecutive shots from one session — same wardrobe, backdrop, lighting, character; only angle differs. Absolutely no text, no captions, no watermarks, no logos, no UI elements, no phone, no device, no screen, no social media overlays in any panel." --aspect-ratio 9:16 --json --no-input | grep -oE 'https?://[^"]+' | head -1)
curl -sSL -o ./<persona-slug>/casting.png "$URL"
<style-model>
=
gemini-3.1-flash-image
(realistic) or
grok-imagine
(stylized). Apply fallback wrapper to
flux-2-max
.
Reference image: add
-i <reference-path>
to this same call. Identity via i2i, same prompt + cost.
单次调用。生成4个全身角度,纯色无缝灰色背景,专属穿搭,中性表情。9:16竖幅,内部包含2×2网格(每个面板≈9:16——适配全身展示)。
bash
URL=$(gen-ai generate -m <style-model> -p "<subject-opening> The image shows the same exact character from four camera angles in a 2x2 portrait grid (9:16 canvas). ALL FOUR PANELS share: identical plain seamless studio gray background — flat uniform fill, no gradient/texture/scene. Identical signature wardrobe — same complete outfit head to feet (or, for common pets, identical simple accessories like collar/bandana/sweater — never humanoid clothing). Identical neutral expression — relaxed mouth. Identical even soft frontal softbox key + subtle fill + soft ground shadow, no rim lights, no colored gels. Identical hair/fur. Same identity in every panel: <frozen appearance block>. Differs only in angle: TOP-LEFT front-facing full body eyes at camera; TOP-RIGHT 3/4 facing camera-right; BOTTOM-LEFT full left profile looking off-left; BOTTOM-RIGHT 3/4 from behind over-the-shoulder. Magazine fashion model sheet composition, thin clean grid lines. The four panels MUST look like consecutive shots from one session — same wardrobe, backdrop, lighting, character; only angle differs. Absolutely no text, no captions, no watermarks, no logos, no UI elements, no phone, no device, no screen, no social media overlays in any panel." --aspect-ratio 9:16 --json --no-input | grep -oE 'https?://[^"]+' | head -1)
curl -sSL -o ./<persona-slug>/casting.png "$URL"
<style-model>
=
gemini-3.1-flash-image
(写实)或
grok-imagine
(风格化)。若失败则降级为
flux-2-max
参考图片: 在本次调用中添加
-i <reference-path>
参数。通过图生图确定身份,提示和成本不变。

Subject openings (replace
<subject-opening>
above)

主题初始设定(替换上方
<subject-opening>

  • Photoreal human (default)"Professional fashion photograph head-to-toe casting card / model sheet, shot on 85mm lens, RAW photo, 8k UHD, crisp focus, photorealistic, natural skin texture with visible pores, no AI smoothing."
  • Anthropomorphic humanoid pet (DEFAULT for pet/animal briefs — fluffy biped in cute clothes, project's
    pets
    category)
    "An anthropomorphic [puppy / kitten / bunny / hamster / duckling / fox cub / baby panda / hedgehog / penguin / monkey] character standing upright on two legs like a human, full body visible head to toe, humanoid body proportions, expressive face, [coat detail — e.g. warm golden honey-colored fur / pure snow white fluffy fur / deep midnight black sleek fur / warm ginger orange fur / chocolate brown fur / shimmering silver grey fur / patchy calico orange-white-black fur / soft cream colored fur], adorable, looking directly at camera, professional fashion photograph, shot on 85mm lens, shallow depth of field, cinematic studio lighting with soft key light, photorealistic, RAW photo, 8k ultra high definition, crisp focus." Wardrobe options the agent can pick from when composing the casting card outfit: tiny knitted sweater | mini oversized hoodie | dapper bow tie + collar | flower crown of daisies and roses | tiny stylish sunglasses | flowing superhero cape | stylish bandana around neck | au naturel (no clothing, just fluffy fur). Vibe options for expression / pose: Sassy Queen (hand on hip, serving looks, unbothered) | Silly King (goofy, tongue out, awkward funny pose) | Sleepy Baby (drowsy half-asleep, leaning) | Zoomies Mode (excited, arms up, chaotic joy) | Distinguished (regal, arms crossed, noble) | Mischief Maker (sneaky, hands behind back, guilty-not-sorry). Suggested name (food-themed, project's pool): Biscuit, Mochi, Nugget, Bean, Waffles, Tofu, Dumpling, Peanut, Pickle, Noodle, Churro, Pretzel, Taco, Maple, Truffle, Sesame, Crouton, Muffin, Cupcake, Boba. Suggested bio style (gen-z internet humor, pipe-separated): "professional napper | treat negotiator | certified good boy/girl" / "fluffy & unbothered | snack motivated | full-time cuddle bug" / "chaos gremlin | zoomies champion | will boop for treats".
  • Real-form quadruped pet (opt-in only — creator explicitly said "real form / four-legged / real cat / on all fours")"Professional pet portrait photograph head-to-toe model sheet of a [breed] [animal] in their natural anatomical form (four-legged / quadruped, NOT humanoid), full body nose-to-tail visible, shot on 85mm with shallow depth of field, RAW, 8k UHD, photorealistic natural fur with visible individual hairs, no AI smoothing. Pet may wear simple accessories (collar, bandana, harness) but never humanoid clothing — the character is the animal in real anatomical form."
  • 3D-animated anthropomorphic fruit / object"High quality 3D-animated head-to-toe character sheet of an anthropomorphic [fruit/object] character, feature-film animation aesthetic, [fruit/object] serves as the head on a full human-proportioned athletic body, [skin/surface] texture extending naturally to arms and hands, ultra-high resolution, brainrot character-drama vibe, dramatic cinematic studio lighting with soft fill + subtle ground shadow."
  • Anime / manga"High quality anime / manga style head-to-toe character sheet, cel-shaded illustration, clean line art, vibrant saturated colors, soft anime lighting, expressive eyes, [shoujo/shonen/kawaii] aesthetic, magazine character reference sheet composition."
  • Stylized 3D-animated human / fantasy"High quality stylized 3D-animated head-to-toe character sheet, feature-film animation aesthetic, soft global illumination, slightly exaggerated proportions, expressive features, character-animation art direction."
  • Painted / illustrated"Hand-painted editorial illustration head-to-toe character sheet, [watercolor/gouache/digital painting] aesthetic, painterly brushwork, layered soft light, magazine illustration composition."
Casting-card rules — non-negotiable: identical bg / wardrobe-or-accessories / lighting / expression / hair-fur across all 4 panels — only angle differs | bg flat plain gray | full body (head-to-toe humans/bipeds INCLUDING anthropomorphic humanoid pets, nose-to-tail quadrupeds for real-form pet opt-in) | wardrobe stays same in all panels (same outfit for humans + anthropomorphic pets — yes, anthropomorphic pets wear humanoid clothing like tiny sweaters/mini hoodies/bow ties; only real-form quadruped pets are limited to simple accessories like collar/bandana/harness) | expression and pose match the chosen vibe (Sassy Queen / Silly King / etc. for anthropomorphic pets) — neutral default for humans, eyes at camera (or off per profile/back).
  • 写实人类(默认)—— "Professional fashion photograph head-to-toe casting card / model sheet, shot on 85mm lens, RAW photo, 8k UHD, crisp focus, photorealistic, natural skin texture with visible pores, no AI smoothing."
  • 拟人化人形宠物(宠物/动物需求默认——穿着可爱衣服的毛茸茸双足角色,项目
    pets
    分类风格)—— "An anthropomorphic [puppy / kitten / bunny / hamster / duckling / fox cub / baby panda / hedgehog / penguin / monkey] character standing upright on two legs like a human, full body visible head to toe, humanoid body proportions, expressive face, [coat detail — e.g. warm golden honey-colored fur / pure snow white fluffy fur / deep midnight black sleek fur / warm ginger orange fur / chocolate brown fur / shimmering silver grey fur / patchy calico orange-white-black fur / soft cream colored fur], adorable, looking directly at camera, professional fashion photograph, shot on 85mm lens, shallow depth of field, cinematic studio lighting with soft key light, photorealistic, RAW photo, 8k ultra high definition, crisp focus." Agent可从以下选项中为选角卡选择穿搭:tiny knitted sweater | mini oversized hoodie | dapper bow tie + collar | flower crown of daisies and roses | tiny stylish sunglasses | flowing superhero cape | stylish bandana around neck | au naturel(无衣物,仅毛茸茸的皮毛)。表情/姿势风格选项:Sassy Queen(手叉腰,气场十足,满不在乎)| Silly King(搞怪,吐舌头,笨拙有趣的姿势)| Sleepy Baby(昏昏欲睡,半梦半醒,身体倾斜)| Zoomies Mode(兴奋,手臂举起,混乱的快乐)| Distinguished(高贵,双臂交叉,端庄)| Mischief Maker(鬼鬼祟祟,手背后,知错不改)。推荐食物系名字(项目备选池):Biscuit、Mochi、Nugget、Bean、Waffles、Tofu、Dumpling、Peanut、Pickle、Noodle、Churro、Pretzel、Taco、Maple、Truffle、Sesame、Crouton、Muffin、Cupcake、Boba。推荐Z世代幽默风格简介(竖线分隔):"professional napper | treat negotiator | certified good boy/girl" / "fluffy & unbothered | snack motivated | full-time cuddle bug" / "chaos gremlin | zoomies champion | will boop for treats"。
  • 真实形态四足宠物(仅主动选择——创作者明确指定"real form / four-legged / real cat / on all fours")—— "Professional pet portrait photograph head-to-toe model sheet of a [breed] [animal] in their natural anatomical form (four-legged / quadruped, NOT humanoid), full body nose-to-tail visible, shot on 85mm with shallow depth of field, RAW, 8k UHD, photorealistic natural fur with visible individual hairs, no AI smoothing. Pet may wear simple accessories (collar, bandana, harness) but never humanoid clothing — the character is the animal in real anatomical form."
  • 3D动画化拟人水果/物品—— "High quality 3D-animated head-to-toe character sheet of an anthropomorphic [fruit/object] character, feature-film animation aesthetic, [fruit/object] serves as the head on a full human-proportioned athletic body, [skin/surface] texture extending naturally to arms and hands, ultra-high resolution, brainrot character-drama vibe, dramatic cinematic studio lighting with soft fill + subtle ground shadow."
  • 动漫/漫画—— "High quality anime / manga style head-to-toe character sheet, cel-shaded illustration, clean line art, vibrant saturated colors, soft anime lighting, expressive eyes, [shoujo/shonen/kawaii] aesthetic, magazine character reference sheet composition."
  • 风格化3D动画人类/奇幻角色—— "High quality stylized 3D-animated head-to-toe character sheet, feature-film animation aesthetic, soft global illumination, slightly exaggerated proportions, expressive features, character-animation art direction."
  • 手绘/插画风格—— "Hand-painted editorial illustration head-to-toe character sheet, [watercolor/gouache/digital painting] aesthetic, painterly brushwork, layered soft light, magazine illustration composition."
选角卡规则——不可协商: 所有4个面板的背景/穿搭或配饰/光线/表情/毛发完全一致——仅角度不同 | 背景为纯色平灰 | 全身展示(人类/双足角色包括拟人化人形宠物,真实形态四足宠物为从头到尾) | 所有面板穿搭保持一致(人类和拟人化宠物为相同服装——是的,拟人化宠物穿着小毛衣/迷你卫衣/领结等人形服装;仅真实形态四足宠物仅限项圈/头巾/胸背带等简单配饰) | 表情和姿势匹配所选风格(拟人化宠物为Sassy Queen/Silly King等;人类默认中性,眼睛看向镜头或根据侧面/背面视角偏移)。

Step 4 — Reel (only if requested)

步骤4 — 短视频(仅当请求时)

Two sub-calls. Seedance treats
imageUrls
as first frame (verified) — passing the casting-card grid would open the reel on it. So: generate single-frame reel-hero first, then animate.
两次子调用。Seedance将
imageUrls
视为第一帧(已验证)——传入选角卡网格会让短视频从该画面开始。因此:先生成单帧短视频首图,再进行动画处理。

Pick the concept first

先确定创意概念

Don't auto-default to "slow contemplative push-in" — most creator content rewards confident energy.
Concepts: Hook reveal | Power pose | Attitude flick (look-away → snap-back smirk) | Walk-by | Outfit reveal | Vibe drop (lighting shift mid-clip) | Establish-and-hold | Calm narrative beat (only for genuinely-calm-niche personas).
Hook rule: first second must arrest attention. Platform sensitivity: TikTok / IG Reel / Shorts → punchy; LinkedIn / YouTube → professional / calm; fruit / 3D / anime → lean stylized + confident (calm beats fall flat for them).
Camera move (pick ONE): slow push-in | slow pull-out | partial orbit | slow track left/right | static | tilt up | whip-in.
Action (pick ONE): punchy default — confident camera-direct stare with attitude shift | power-pose hold | hair flip | smile breaking through | walking confidently toward camera | outfit-reveal turn | hand gesture | rhythmic vibe | look over shoulder | lighting drop. Quieter (calm-niche only) — looking up from book + soft smile | slow head tilt | hair lift in light wind | eyes opening | lip part.
Environment / lighting: atmospheric specifics > generic. Replace "in a cafe" with "neon-pink Tokyo coffee shop interior, signage reflections" (punchy) OR "rain-streaked window with candlelight, steam from teacup" (calm). Match energy to concept.
不要默认采用“缓慢推进的沉思镜头”——大多数创作者内容需要充满活力的风格。
创意概念:钩子式展示 | 力量姿势 | 态度切换(转头→突然回头微笑) | 走过镜头 | 穿搭展示 | 风格突变(镜头中光线切换) | 定格展示 | 平静叙事镜头(仅适用于真正平静的细分领域人设)。
钩子规则: 第一秒必须抓住注意力。平台适配: TikTok / IG Reel / Shorts → 节奏明快;LinkedIn / YouTube → 专业/沉稳;水果/3D/动漫风格→偏向风格化+充满活力(平静镜头效果不佳)。
镜头移动(选一个): 缓慢推进 | 缓慢拉远 | 局部环绕 | 缓慢左右平移 | 静止 | 向上倾斜 | 快速切入。
动作(选一个): 明快默认——自信直视镜头并切换态度 | 力量姿势定格 | 甩头发 | 露出微笑 | 自信走向镜头 | 转身展示穿搭 | 手势 | 节奏风格 | 回头看 | 光线变暗。平静风格(仅适用于平静细分领域)——抬头看书+温柔微笑 | 缓慢歪头 | 风吹起头发 | 睁眼 | 嘴唇微张。
环境/光线: 具体氛围 > 通用描述。将“in a cafe”替换为“霓虹粉色东京咖啡店内部,招牌反光”(明快)或“雨痕玻璃窗,烛光,茶杯蒸汽”(平静)。氛围需匹配创意概念。

Sub-step 4-i: Reel hero (gemini i2i, target AR, single full-body frame)

子步骤4-i:短视频首图(Gemini图生图,目标AR格式,单帧全身画面)

bash
URL=$(gen-ai generate -m gemini-3.1-flash-image -i ./<persona-slug>/casting.png -p "<subject-opening from Step 3> Single full-body photograph of the same character from the casting-card reference, head-to-toe in frame. <frozen appearance block>. Wearing the same signature wardrobe shown in casting card. <opening pose / framing for chosen concept>. <atmospheric environment + lighting>. Composition: full body head to toe, framed for video animation in <platform-AR>. Real photograph quality (or stylized rendering per opening). No text, no captions, no watermarks, no logos, no UI, no phone, no device, no screen, no social media overlays." --aspect-ratio <platform-AR> --json --no-input | grep -oE 'https?://[^"]+' | head -1)
curl -sSL -o ./<persona-slug>/reel-hero.png "$URL"
Cost: ~3 cr. Apply fallback to
flux-2-max
.
Reel-hero ≠ final action pose. Gemini tends to preserve the casting card's neutral stance even when prompted for power-pose / mid-action (verified). That's fine — the action lands in the Seedance prompt at 4-ii. Don't re-roll the hero just because the pose looks calmer than expected.
bash
URL=$(gen-ai generate -m gemini-3.1-flash-image -i ./<persona-slug>/casting.png -p "<subject-opening from Step 3> Single full-body photograph of the same character from the casting-card reference, head-to-toe in frame. <frozen appearance block>. Wearing the same signature wardrobe shown in casting card. <opening pose / framing for chosen concept>. <atmospheric environment + lighting>. Composition: full body head to toe, framed for video animation in <platform-AR>. Real photograph quality (or stylized rendering per opening). No text, no captions, no watermarks, no logos, no UI, no phone, no device, no screen, no social media overlays." --aspect-ratio <platform-AR> --json --no-input | grep -oE 'https?://[^"]+' | head -1)
curl -sSL -o ./<persona-slug>/reel-hero.png "$URL"
成本:约3 cr。若失败则降级为
flux-2-max
短视频首图 ≠ 最终动作姿势。 Gemini即使提示力量姿势/动作中,也倾向于保留选角卡的中立姿态(已验证)。这没问题——动作将在4-ii的Seedance提示中体现。不要仅因为姿势比预期平静就重新生成首图。

Sub-step 4-ii: Animation (Seedance i2v, audio enabled)

子步骤4-ii:动画(Seedance图生视频,启用音频)

Platform → AR + duration:
PlatformARDuration
tiktok / instagram-reel / instagram-story / youtube-shorts9:168s
instagram-feed1:1 (Seedance has no 4:5; closest universal)6s
youtube / linkedin / x / twitter16:98–10s
bash
URL=$(gen-ai generate -m seedance-2.0 -i ./<persona-slug>/reel-hero.png -p "<subject-opening>. <frozen appearance block>. Wearing same signature wardrobe. <single action from vocabulary matching the concept — strong language here, this is where action actually lands>. <same atmospheric environment + lighting as hero>. <single camera move from vocabulary>. Audio: <ambient soundscape matching scene — environmental sounds, mood-appropriate underscore; no spoken dialogue, no voiceover, no music vocals>. Single continuous moment, no scene changes, no multiple sequential actions, no fast or chaotic movement. No text, no captions, no watermarks, no logos, no UI, no phone, no device, no screen, no social media overlays." --aspect-ratio <platform-AR> --duration <platform-duration> --generate-audio --json --no-input | grep -oE 'https?://[^"]+' | head -1)
curl -sSL -o ./<persona-slug>/reel.mp4 "$URL"
Cost: 1 cr/sec × duration. Total reel: ~8–13 cr.
Seedance prompt order (verified KLING_RULES): Subject → Action → Environment → Camera → Lighting → Audio. One continuous camera move, one primary action — never chain.
Models we DON'T use for reel: any
startFrame
-only i2v (
seedance-i2v
,
hailuo-2.3-fast
,
runway-gen3a-turbo
,
wan-2.7-i2v
,
luma-flash2-i2v
,
pika-frames
) drifts across the clip;
runway-gen4-ref
returns a still PNG (verified, not a video);
kling-3.0-pro
/
veo-3.1
/
veo-3.1-fast
are
startFrame
-only — multi-image char-ref modes (Kling element / Veo Ingredients) aren't surfaced in the CLI today (roadmap).
Honest constraint: Seedance's
imageUrls
behaves as first frame, not pure char-ref. Single-frame hero + i2v = clean character image opens the reel and animates from there.
平台→AR格式+时长:
平台AR格式时长
tiktok / instagram-reel / instagram-story / youtube-shorts9:168s
instagram-feed1:1(Seedance无4:5;最接近的通用格式)6s
youtube / linkedin / x / twitter16:98–10s
bash
URL=$(gen-ai generate -m seedance-2.0 -i ./<persona-slug>/reel-hero.png -p "<subject-opening>. <frozen appearance block>. Wearing same signature wardrobe. <single action from vocabulary matching the concept — strong language here, this is where action actually lands>. <same atmospheric environment + lighting as hero>. <single camera move from vocabulary>. Audio: <ambient soundscape matching scene — environmental sounds, mood-appropriate underscore; no spoken dialogue, no voiceover, no music vocals>. Single continuous moment, no scene changes, no multiple sequential actions, no fast or chaotic movement. No text, no captions, no watermarks, no logos, no UI, no phone, no device, no screen, no social media overlays." --aspect-ratio <platform-AR> --duration <platform-duration> --generate-audio --json --no-input | grep -oE 'https?://[^"]+' | head -1)
curl -sSL -o ./<persona-slug>/reel.mp4 "$URL"
成本:1 cr/秒 × 时长。短视频总成本:约8–13 cr。
Seedance提示顺序(已验证KLING_RULES):主题→动作→环境→镜头→光线→音频。单次连续镜头移动,单个主要动作——绝不要串联多个动作。
我们不用于短视频的模型: 任何仅支持
startFrame
的图生视频模型(
seedance-i2v
hailuo-2.3-fast
runway-gen3a-turbo
wan-2.7-i2v
luma-flash2-i2v
pika-frames
)会在视频中出现漂移;
runway-gen4-ref
返回静态PNG(已验证,非视频);
kling-3.0-pro
/
veo-3.1
/
veo-3.1-fast
仅支持
startFrame
——多图像角色参考模式(Kling元素/Veo成分)目前未在CLI中开放(规划中)。
实际限制: Seedance的
imageUrls
功能是作为第一帧,而非纯粹的角色参考。单帧首图+图生视频=短视频以清晰的角色画面开场并从该画面开始动画。

Step 5 — Captions, deliver

步骤5 — 文案,交付

Append captions to
persona.md
— 3 by default, in persona's voice. Hashtag block ALWAYS leads with
#picsart #picsartcreator
, then platform-specific niche tags.
PlatformLengthNiche tags after Picsart pair
tiktok / youtube-shorts80–150 chars, single hook4–6 trending
instagram (reel/story/feed)150–300 chars, hook + story6–10
youtube standard300–500 chars, keyword-dense3–5 keyword
linkedin500–1000 chars, professional3–5 industry
x / twitter≤280 chars total (incl tags)1–2
(no platform)~150 chars, balanced4–6 generic
Print final summary:
✓ Persona "Lena" delivered. Local: ./lena/. Spent: ~3 credits. Files: casting.png, persona.md (+ _meta.json)
. Add
reel-hero.png
+
reel.mp4
to file list if reel was generated.
将文案追加到
persona.md
中——默认3条,匹配人设语气。标签块必须以
#picsart #picsartcreator
开头,然后添加平台专属细分领域标签。
平台长度Picsart标签后的细分领域标签
tiktok / youtube-shorts80–150字符,单个钩子4–6个热门标签
instagram(reel/story/feed)150–300字符,钩子+故事6–10个
youtube标准视频300–500字符,关键词密集3–5个关键词标签
linkedin500–1000字符,专业风格3–5个行业标签
x / twitter总计≤280字符(含标签)1–2个
(无指定平台)~150字符,平衡风格4–6个通用标签
打印最终总结:
✓ Persona "Lena" delivered. Local: ./lena/. Spent: ~3 credits. Files: casting.png, persona.md (+ _meta.json)
。若生成了短视频,在文件列表中添加
reel-hero.png
+
reel.mp4

Cost transparency

成本透明度

Show plan before spending — pull live rates with
gen-ai pricing <model>
, never hardcode. After each step:
✓ <step> (<credits>)
.
Plan:
  Casting card (gemini-3.1-flash-image, 1 image)         ~3 cr
[ Reel hero (gemini-3.1-flash-image, 1 image)            ~3 cr ]   reel only
[ Reel animation (seedance-2.0, 8s @ 9:16)               ~8 cr ]
  ────────────────────────────────────────────────────────
  Estimated total                                       ~3 or ~14 cr
Continue? [Y/n]
执行前展示计划——使用
gen-ai pricing <model>
获取实时费率,绝不要硬编码。每步完成后:
✓ <step> (<credits>)
Plan:
  Casting card (gemini-3.1-flash-image, 1 image)         ~3 cr
[ Reel hero (gemini-3.1-flash-image, 1 image)            ~3 cr ]   reel only
[ Reel animation (seedance-2.0, 8s @ 9:16)               ~8 cr ]
  ────────────────────────────────────────────────────────
  Estimated total                                       ~3 or ~14 cr
Continue? [Y/n]

Output

输出结构

./<persona-slug>/
├── persona.md       # name, bio, voice, frozen appearance block, captions
├── casting.png      # head-to-toe 4-angle casting card
├── reel-hero.png    # only if reel requested
├── reel.mp4         # only if reel requested (includes ambient audio)
└── _meta.json       # step parameters
./<persona-slug>/
├── persona.md       # 名称、简介、语气、固定外观模块、文案
├── casting.png      # 全身4角度选角卡
├── reel-hero.png    # 仅当请求短视频时存在
├── reel.mp4         # 仅当请求短视频时存在(含环境音)
└── _meta.json       # 步骤参数

Re-rolls

重新生成

Natural language. Agent reads
_meta.json
and reruns the right step:
  • "Regenerate Lena with darker hair" (~3 cr)
  • "Redo the reel with a slow camera push instead of static" (~8 cr)
Confirm spend before re-running.
使用自然语言。Agent读取
_meta.json
并重新执行对应步骤:
  • "Regenerate Lena with darker hair"(约3 cr)
  • "Redo the reel with a slow camera push instead of static"(约8 cr)
重新执行前确认成本。

Limitations (today)

当前限制

  • Local-only output (Drive integration tracked v1.1 once new CLI Drive API ships)
  • One persona per run (multi-persona via
    gen-ai generate -m kling-multi-image-v2-1 -i nova/casting.png -i lena/casting.png -p "<scene>"
    or future Scene Composer skill)
  • No premium photoreal tier (
    gemini-3-pro-image
    deferred)
  • No premium motion-control reel (Kling Motion Control V3 + creator motion-ref deferred)
  • No voice / talking-head reel (Picsart-Eleven gender unreliable). Reel ships with Seedance ambient audio — environmental + atmospheric underscore, no synthesized speech
  • No bespoke music (Seedance underscore via
    --generate-audio
    ; dedicated music pass deferred)
  • No Kling-element / Veo-Ingredients char-ref video (not surfaced in CLI)
  • No built-in scene variations (casting card is the character; downstream tools handle scenes)
  • 仅本地输出(Drive集成计划在v1.1版本中支持,待新CLI Drive API发布)
  • 每次运行仅生成一个人设(多人设可通过
    gen-ai generate -m kling-multi-image-v2-1 -i nova/casting.png -i lena/casting.png -p "<scene>"
    或未来的场景合成技能实现)
  • 无高级写实 tier(
    gemini-3-pro-image
    延期支持)
  • 无高级运动控制短视频(Kling Motion Control V3 + 创作者运动参考延期支持)
  • 无语音/对话式短视频(Picsart-Eleven性别识别不可靠)。短视频附带Seedance环境音——环境音+氛围背景音乐,无合成语音
  • 无定制音乐(Seedance通过
    --generate-audio
    提供背景音乐;专属音乐模块延期支持)
  • 无Kling元素/Veo成分角色参考视频(未在CLI中开放)
  • 无内置场景变体(选角卡为角色本身;后续工具处理场景)