kling-3-prompting

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Overview

概述

Kling 3.0 is a unified multimodal video model. It understands cinematic direction, not keyword lists. Write prompts like a director — describe what the audience sees, hears, and feels over time.

Core shift: Description → Direction. Think "direct a scene" not "describe an image."

Kling 3.0是一款统一的多模态视频模型。它能理解电影化导演指令，而非关键词列表。要像导演一样撰写提示词——描述观众随时间推移看到、听到和感受到的内容。

核心转变： 描述 → 导演式指令。要思考“执导一个场景”而非“描述一张图片”。

Interactive Builder Workflow

交互式提示词构建流程

When invoked, guide the user through these steps using

AskUserQuestion

dot

digraph builder {
  "1. Generation mode?" [shape=diamond];
  "Text-to-Video" [shape=box];
  "Image-to-Video" [shape=box];
  "Multi-Shot Sequence" [shape=box];
  "Keyframe Transition" [shape=box];
  "2. Gather scene details" [shape=box];
  "3. Assemble prompt" [shape=box];
  "4. Present & refine" [shape=box];

  "1. Generation mode?" -> "Text-to-Video";
  "1. Generation mode?" -> "Image-to-Video";
  "1. Generation mode?" -> "Multi-Shot Sequence";
  "1. Generation mode?" -> "Keyframe Transition";
  "Text-to-Video" -> "2. Gather scene details";
  "Image-to-Video" -> "2. Gather scene details";
  "Multi-Shot Sequence" -> "2. Gather scene details";
  "Keyframe Transition" -> "2. Gather scene details";
  "2. Gather scene details" -> "3. Assemble prompt";
  "3. Assemble prompt" -> "4. Present & refine";
}

调用时，使用

AskUserQuestion

引导用户完成以下步骤：

dot

digraph builder {
  "1. Generation mode?" [shape=diamond];
  "Text-to-Video" [shape=box];
  "Image-to-Video" [shape=box];
  "Multi-Shot Sequence" [shape=box];
  "Keyframe Transition" [shape=box];
  "2. Gather scene details" [shape=box];
  "3. Assemble prompt" [shape=box];
  "4. Present & refine" [shape=box];

  "1. Generation mode?" -> "Text-to-Video";
  "1. Generation mode?" -> "Image-to-Video";
  "1. Generation mode?" -> "Multi-Shot Sequence";
  "1. Generation mode?" -> "Keyframe Transition";
  "Text-to-Video" -> "2. Gather scene details";
  "Image-to-Video" -> "2. Gather scene details";
  "Multi-Shot Sequence" -> "2. Gather scene details";
  "Keyframe Transition" -> "2. Gather scene details";
  "2. Gather scene details" -> "3. Assemble prompt";
  "3. Assemble prompt" -> "4. Present & refine";
}

Step 1: Determine Generation Mode

步骤1：确定生成模式

Ask the user which mode:

Text-to-Video — prompt from scratch
Image-to-Video — animate a reference image
Multi-Shot Sequence — 2-6 shot storyboard (up to 15s)
Keyframe Transition — start frame → end frame with interpolated motion

询问用户选择哪种模式：

Text-to-Video — 从零开始生成提示词
Image-to-Video — 为参考图片添加动画效果
Multi-Shot Sequence — 2-6镜头的故事板（最长15秒）
Keyframe Transition — 从起始帧到结束帧的插值运动过渡

Step 2: Gather Scene Details

步骤2：收集场景细节

Ask about each element (adapt questions to mode):

Element	Question	Why it matters
Subject	Who/what is the focus? Specific appearance details?	Anchors consistency — define distinguishing traits early
Action	What happens? Describe the timeline (first → then → finally)	Kling 3.0 excels at sequential action over 15s arcs
Environment	Where? Be specific (not "a street" but "narrow Tokyo alley, steam from grates")	Grounds the scene physically
Camera	Shot type and movement? (See camera reference below)	Cinematic language produces far better results
Lighting	What light sources? Name them specifically	"Flickering neon" beats "dramatic lighting"
Mood/Emotion	What should the audience feel?	Drives color grade, pacing, music
Audio	Dialogue? Ambient sound? Music?	Kling 3.0 generates native audio + lip-sync
Duration	How long? (3-15s)	Longer = describe progression over time
Aspect Ratio	16:9 / 9:16 / 1:1 / 21:9?	16:9 cinematic, 9:16 social, 21:9 ultra-wide

Image-to-Video: Focus on how the scene evolves from the image — movement, camera motion, environmental change. The model preserves identity/layout from the source.

Keyframes: Ask for start and end frame descriptions. Frames should match in color, style, and lighting. Prompt sparingly — Kling infers motion well.

Multi-Shot: Define each shot separately with its own framing, subject, action, and duration. Label shots explicitly.

询问每个元素的相关信息（根据模式调整问题）：

元素	问题	重要性
主体	焦点是谁/是什么？有哪些具体的外观细节？	确保一致性——尽早定义独特特征
动作	发生了什么？描述时间线（首先→然后→最后）	Kling 3.0擅长处理15秒时长内的连续动作
环境	场景在哪里？要具体（不要说“一条街道”，要说“东京狭窄小巷，下水道口冒着蒸汽”）	让场景有真实的物理依托
镜头	镜头类型和运动方式？（见下方镜头参考）	电影化语言能产出更优质的结果
灯光	有哪些光源？具体说出名称	“闪烁的霓虹灯”比“戏剧性灯光”效果更好
氛围/情绪	观众应该感受到什么？	决定色彩分级、节奏和音乐风格
音频	有对话？环境音？音乐？	Kling 3.0可原生生成音频+唇形同步效果
时长	视频时长？（3-15秒）	时长越长，越要描述随时间的变化过程
宽高比	16:9 / 9:16 / 1:1 / 21:9？	16:9适合电影，9:16适合社交平台，21:9为超宽屏

Image-to-Video模式： 重点描述场景如何从图片演变——运动方式、镜头移动、环境变化。模型会保留源图的主体和布局。

关键帧模式： 询问起始帧和结束帧的描述。帧的色彩、风格和灯光要匹配。提示词要简洁——Kling能很好地推断帧间的运动。

多镜头模式： 单独定义每个镜头，包含各自的取景、主体、动作和时长。明确标注每个镜头。

Step 3: Assemble the Prompt

步骤3：组合提示词

Use the Master Formula:

[Scene/Environment] + [Subject & Appearance] + [Action Timeline] + [Camera Movement] + [Audio & Atmosphere] + [Technical Specs]

Writing rules:

Use cinematic motion verbs: dolly push, whip-pan, crash zoom, rack focus, tracking shot — NOT "moves" or "goes"
Name real light sources: neon signs, candlelight, golden hour, LED panels — NOT "dramatic lighting"
Include texture for credibility: grain, lens flares, condensation, fabric sheen, smoke, sweat
Describe temporal flow: beginning → middle → end
Keep to 1-3 rich sentences per shot (specificity > length)
For dialogue: use character labels, assign voice tone/emotion, use transitional words ("Immediately," "Pause")

使用通用公式：

[场景/环境] + [主体与外观] + [动作时间线] + [镜头运动] + [音频与氛围] + [技术参数]

撰写规则：

使用电影化运动动词：推进镜头、快速摇镜、急推变焦、焦点切换、跟拍——不要用“移动”或“走”
说出真实的光源名称：霓虹灯、烛光、黄金时刻、LED面板——不要用“戏剧性灯光”
添加细节增强真实感：颗粒感、镜头光晕、冷凝水、织物光泽、烟雾、汗水
描述时间流：开始→中间→结束
每个镜头保持1-3句丰富的描述（具体性＞长度）
对话部分：使用角色标签，指定语气/情绪，使用过渡词（“立刻”、“停顿”）

Step 4: Present & Refine

步骤4：呈现与优化

Present the assembled prompt. Ask if they want to:

Adjust any element
Add a negative prompt
Generate variations (different duration, different camera, different mood)

展示组合好的提示词。询问用户是否需要：

调整任何元素
添加负向提示词
生成变体（不同时长、不同镜头、不同氛围）

Quick Reference

快速参考

Camera Movements

镜头运动

Movement	Effect	Example phrase
Dolly push-in	Builds intimacy/tension	"slow dolly push-in toward her face"
Dolly zoom	Vertigo/dramatic reveal	"dolly zoom creating disorienting depth shift"
Tracking shot	Follows subject laterally	"camera tracks alongside as she walks"
Whip-pan	Energy/surprise	"whip-pan to reveal the door"
Crash zoom	Shock/emphasis	"sudden crash zoom on the object"
Rack focus	Shift attention	"rack focus from foreground hand to background figure"
Handheld/shoulder-cam	Raw/documentary feel	"handheld shoulder-cam with subtle sway"
Static tripod	Composed/observational	"locked-off static tripod, wide shot"
FPV drone	High-energy immersion	"dynamic FPV drone shot chasing through corridor"
Low-angle tracking	Heroic/imposing	"low-angle tracking shot, subject towers above"
Truck left/right	Lateral reveal	"camera trucks right revealing the cityscape"
Tilt up/down	Vertical reveal	"slow tilt up from boots to face"

运动方式	效果	示例表述
推进镜头（Dolly push-in）	增强亲密感/紧张感	“缓慢推进镜头，对准她的脸庞”
推拉变焦（Dolly zoom）	眩晕感/戏剧性揭示	“推拉变焦制造令人迷失的景深变化”
跟拍镜头（Tracking shot）	横向跟随主体	“镜头横向跟拍她的行走过程”
快速摇镜（Whip-pan）	充满活力/带来惊喜	“快速摇镜，露出门的位置”
急推变焦（Crash zoom）	冲击感/强调重点	“突然急推变焦对准目标物体”
焦点切换（Rack focus）	转移注意力	“焦点从前景的手切换到背景的人物”
手持/肩扛镜头	原始/纪录片质感	“手持肩扛镜头，带有轻微晃动”
固定三脚架	构图规整/观察式视角	“固定三脚架，广角镜头”
FPV无人机镜头	高能量沉浸感	“动态FPV无人机镜头，追逐穿过走廊”
低角度跟拍	英雄感/威严感	“低角度跟拍，主体高高在上”
横向移镜（Truck left/right）	横向展示场景	“镜头向右移，露出城市景观”
上下摇镜（Tilt up/down）	纵向展示场景	“缓慢向上摇镜，从靴子拍到脸部”

Lens & Film Stock

镜头与胶片类型

Phrase	Effect
"Shot on 35mm film"	Warm grain, organic texture
"Macro 85mm lens"	Tight detail, shallow depth of field
"Wide-angle steadicam"	Smooth, immersive, spatial
"Handheld camcorder"	Raw VHS energy, nostalgic
"Anamorphic lens flare"	Cinematic horizontal streaks

表述	效果
"Shot on 35mm film"	温暖颗粒感，有机质感
"Macro 85mm lens"	细节清晰，浅景深
"Wide-angle steadicam"	平滑，沉浸感强，空间感好
"Handheld camcorder"	原始VHS质感，怀旧风格
"Anamorphic lens flare"	电影化水平镜头光晕

Lighting

灯光

Use specific sources, not adjectives:

"Golden hour sun cutting through dusty warehouse windows"
"Flickering neon casting magenta/cyan across wet pavement"
"Single bare bulb swinging, casting moving shadows"
"Cool blue LED panels reflecting off glass surfaces"
"Candlelight warming skin tones, deep shadows beyond"

使用具体光源，而非形容词：

“黄金时刻的阳光透过布满灰尘的仓库窗户照射进来”
“闪烁的霓虹灯在潮湿路面上投射洋红色/青色光影”
“单根裸露灯泡晃动，投下移动的阴影”
“冷蓝色LED面板在玻璃表面反光”
“烛光温暖肤色，远处是深邃的阴影”

Color & Grade

色彩与分级

"Desaturated teal grade, crushed blacks"
"Amber nightclub strobe cutting through smoke"
"Cool blue haze filling the corridor"
"Magenta neon reflecting off wet asphalt"
"Overexposed highlights, blown-out whites"

“低饱和青色调，压暗暗部”
“琥珀色夜店频闪灯光穿透烟雾”
“冷蓝色薄雾填满走廊”
“洋红色霓虹灯在潮湿沥青路面反光”
“高光过曝，白色区域泛白”

Multi-Character Dialogue

多角色对话

Rule	Do	Don't
Name characters	`[Character A: Silver-haired CEO]`	`[Man] says...`
Anchor to action	Agent slams table. [Agent, angrily]: "Where is it?"	Just dialogue without visual action
Assign voice tone	`[CEO, deep authoritative gravelly voice]`	Generic "says"
Control timing	"Immediately," "Pause," "After a beat"	Back-to-back dialogue without transitions

规则	正确做法	错误做法
命名角色	`[Character A: 银发CEO]`	`[男人]说...`
结合动作	特工拍桌。 [特工，愤怒地]：“它在哪里？”	只写对话，没有视觉动作
指定语气	`[CEO，低沉权威的沙哑嗓音]`	通用的“说”
控制节奏	“立刻”、“停顿”、“片刻后”	无过渡的连续对话

Multi-Shot Structure

多镜头结构

Shot 1 (0-5s): [Wide establishing shot description]
Shot 2 (5-10s): [Medium/close-up with action progression]
Shot 3 (10-15s): [Resolution/reaction with camera payoff]

Atmosphere: [Overall mood, color grade]
Audio: [Sound design, music, dialogue]

Label every shot. Assign durations. Describe framing + subject + motion per shot.

镜头1（0-5秒）：[广角建立镜头描述]
镜头2（5-10秒）：[中景/特写，动作推进]
镜头3（10-15秒）：[收尾/反应镜头，镜头呼应]

氛围：[整体情绪，色彩分级]
音频：[音效设计，音乐，对话]

为每个镜头标注编号。分配时长。每个镜头描述取景+主体+运动。

Start & End Frame Tips

起始帧与结束帧技巧

Frames should match in color palette, style, and lighting
Identical start/end frames = seamless loop
Prompt sparingly — Kling infers motion between frames well
Simple camera directions: zoom in/out, pan left/right, tilt up/down
5s for dynamic transitions, 10s for complex transformations
Start frame aspect ratio drives the whole clip

帧的调色板、风格和灯光要匹配
完全相同的起始/结束帧=无缝循环
提示词要简洁——Kling能很好地推断帧间的运动
简单的镜头指令：放大/缩小，左右摇镜，上下摇镜
动态过渡用5秒，复杂变形用10秒
起始帧的宽高比决定整个视频的比例

Negative Prompts

负向提示词

Use to prevent common AI defaults:

smiling, laughing, cartoonish, bright saturated colors, low resolution,
morphing, blurry text, disfigured hands, extra fingers, static pose,
frozen expression, stock photo aesthetic

Customize based on scene — remove items that conflict with your intent.

用于避免常见的AI默认问题：

smiling, laughing, cartoonish, bright saturated colors, low resolution,
morphing, blurry text, disfigured hands, extra fingers, static pose,
frozen expression, stock photo aesthetic

根据场景自定义——移除与你的需求冲突的内容。

Weak → Strong

弱提示词 → 强提示词

Element	Weak	Strong
Camera	"Camera follows person"	"Handheld shoulder-cam drifts behind subject with subtle sway"
Subject	"A woman walking"	"Woman in red dress, heels clicking wet cobblestone"
Environment	"In a city"	"Narrow Tokyo alley, steam from grates, glowing vending machines"
Lighting	"Dramatic lighting"	"Flickering neon casting magenta/cyan across wet pavement"
Texture	"It looks realistic"	"Rain beading on leather jacket, condensation on glass, visible breath"
Motion	"She walks away"	"She turns slowly, hair catches light, disappears around corner"

元素	弱提示词	强提示词
镜头	“镜头跟随人物”	“手持肩扛镜头在主体后方轻微晃动着跟随”
主体	“一个女人在走路”	“穿红裙的女人，高跟鞋踩在潮湿鹅卵石路面上发出咔哒声”
环境	“在城市里”	“东京狭窄小巷，下水道口冒着蒸汽，自动售货机发出微光”
灯光	“戏剧性灯光”	“闪烁的霓虹灯在潮湿路面上投射洋红色/青色光影”
质感	“看起来很真实”	“雨水在皮夹克上凝结成珠，玻璃上有水汽，可见呼出的白气”
动作	“她走开了”	“她慢慢转身，头发被光线照亮，消失在拐角处”

Common Mistakes

常见错误

Mistake	Fix
Keyword lists instead of scene direction	Write like directing a shot: subject + action + camera + environment
Vague motion ("moves," "goes")	Use cinematic verbs: dolly, track, whip-pan, crash zoom
Generic lighting ("dramatic")	Name the source: neon, candle, golden hour, LED panel
Overlong prompts	1-3 rich sentences per shot; specificity > length
No temporal progression	Describe beginning → middle → end of the shot
Mismatched keyframes	Match color, lighting, and style between start/end frames
Unattributed dialogue	Label every speaker with name, tone, and emotion
Cramming multi-shot into one paragraph	Separate and label each shot with duration

错误	修正方法
用关键词列表而非场景导演指令	像执导镜头一样撰写：主体+动作+镜头+环境
模糊的运动描述（“移动”、“走”）	使用电影化动词：推进、跟拍、快速摇镜、急推变焦
通用的灯光描述（“戏剧性”）	说出具体光源：霓虹灯、蜡烛、黄金时刻、LED面板
提示词过长	每个镜头1-3句丰富描述；具体性＞长度
没有时间线描述	镜头的开始→中间→结束都要描述
关键帧不匹配	起始/结束帧的色彩、灯光和风格要一致
对话未标注角色	为每个说话者标注姓名、语气和情绪
多镜头内容挤在一个段落	分开标注每个镜头并注明时长