Veo 3.1 Prompting
Core Capabilities
Veo 3.1 generates video with native audio synthesis and advanced physical realism. Key specifications:
- Resolutions: 720p, 1080p, or 4K
- Aspect Ratios: 16:9 (landscape) or 9:16 (portrait)
- Duration: 4, 6, or 8 seconds per generation (billed as 8s minimum)
- Framerate: 24 FPS (cinematic standard)
- Native Audio: Dialogue, ambient sound, and music synchronized to video
- Reference Images: Up to 3 images for character/style consistency ("Ingredients to Video")
- Advanced Controls: First/Last Frame interpolation, Timestamp Prompting for multi-shot narratives
The Five-Part Formula
Structure every prompt in this order:
[Cinematography] + [Subject] + [Action] + [Context] + [Style & Audio]
Example:
Tracking shot following a weathered fisherman mending nets on a wooden dock. Golden hour sun flares through rigging. Salt spray visible in air. Cinematic 35mm film grain. Sound: seagulls crying, rope creaking, gentle waves.
Critical: Lead with camera movement. Veo 3.1 weights early tokens heavily.
1. Cinematography Specifications
Always specify shot type, camera movement, and lens characteristics first.
Shot Framing
Camera Movement (See camera-techniques.md)
- Linear: , , , ,
- Dynamic: , , ,
- Aerial: , ,
Lens & Focus
Shallow depth of field with creamy bokeh
Deep focus everything sharp
Anamorphic lens with horizontal flares
Macro lens for extreme detail
2. Subject Specification
Describe subjects with exhaustive visual detail for consistency:
Character Anchors (Essential for multi-clip consistency)
- Physical: "Elderly man, silver hair cropped short, weathered face with scar above left eyebrow"
- Clothing: "Faded navy pea coat, brass buttons tarnished, wool scarf with fringe"
- Distinctive markers: "Red-rimmed glasses", "tattoo of anchor on forearm", "gold pocket watch chain"
Object Details
- Material and condition: "Brushed titanium, fingerprint-smudged", "distressed leather with patina"
- Interaction points: "Steam rising from ceramic rim", "condensation beads on cold glass"
3. Action & Physics
Use dynamic verbs demonstrating physical interaction:
- Motion quality: , , ,
- Physics indicators:
realistic water displacement
, , ,
- Temporal markers: , ,
Physics Strengths: Fluid dynamics, fire/smoke, fabric movement, particle systems (dust, snow, sparks).
4. Context & Atmosphere
Establish environment and mood:
Lighting
- Time: , , ,
- Quality: , , ,
- Effects: , ,
Environment
- Spatial: , ,
- Weather: , ,
5. Style & Audio Synthesis
Visual Style (See style-reference.md)
- Genre: , , ,
- Color:
teal and orange blockbuster grade
, bleach bypass desaturated
,
- Texture: , ,
Native Audio (See audio-synthesis.md)
- Dialogue: "Smooth baritone voice: 'The package has arrived.'"
- Ambient: "Coffee shop murmur, ceramic cup placed on saucer"
- Effects: "Satisfying mechanical keyboard clack", "metallic slide and lock"
- Music: "Tense strings building to crescendo", "lo-fi hip hop beat 80bpm"
Audio Sync Tips
- Time dialogue: "At 2-second mark, character speaks"
- Match motion: "Music swells as camera pushes in"
- Layer sounds: "Base ambiance, then specific Foley, then dialogue"
Advanced Workflows
Ingredients to Video (Character Consistency)
Use when maintaining characters across multiple clips:
- Upload 1-3 reference images (character face, outfit, environment)
- Prompt structure: "Same woman from reference image, now walking through rainy Tokyo street. Maintaining red leather jacket and bob haircut."
- Keep anchors consistent: "Same scar on cheek, same gold hoop earrings"
First & Last Frame Interpolation
Create specific transitions:
- Describe starting state: "Sealed envelope on mahogany desk"
- Describe ending state: "Open envelope, letter partially pulled out, handwritten text visible"
- Veo generates smooth 8-second transition between states
- Use for: reveals, transformations, camera moves through portals
Timestamp Prompting (Multi-Shot Narrative)
Break 8 seconds into precise segments for narrative control:
- 0-2s: Wide shot of empty desert highway, heat shimmer rising
- 2-4s: Close-up of motorcycle speedometer needle climbing
- 4-6s: Tracking shot of rider leaning into turn, dust cloud trailing
- 6-8s: Wide shot motorcycle disappearing into sunset, engine roar fading
Model Selection
| Model | Speed | Use Case |
|---|
| Standard | High-quality commercial content, final delivery |
veo-3.1-fast-generate-001
| Fast | Rapid iteration, social media drafts, testing compositions |
Structured Prompting (JSON)
For complex commercial workflows, use JSON structure:
json
{
"project_meta": {
"aspect_ratio": "16:9",
"resolution": "1080p",
"model": "veo-3.1-generate-001"
},
"scene": {
"cinematography": "Macro lens, slow orbit 360 degrees",
"subject": "Artisan coffee cup, steam rising in spiral patterns",
"action": "Hand enters frame, gentle grip, lifting slowly",
"context": "Minimalist white studio, softbox lighting from above",
"style": "Premium commercial aesthetic, shallow depth of field",
"audio": "Ceramic on ceramic sound, gentle sip, ambient cafe hum"
},
"timeline": [
{"time": "0-3s", "focus": "Product detail, steam physics"},
{"time": "3-6s", "focus": "Hand interaction, human element"},
{"time": "6-8s", "focus": "Logo reveal, satisfying audio punctuate"}
]
}
Technical Constraints
- Prompt limit: 2,000 characters
- Image input: Maximum 20MB (JPEG/PNG) for Ingredients to Video
- Audio: Generate natively or add post-production; Veo audio works best with clear, brief dialogue (1-2 sentences max)
- Text rendering: Veo struggles with legible text; avoid critical text overlays or specify "stylized unreadable text"
- Concurrent requests: 50 per minute per region (free tier)
Troubleshooting
| Issue | Solution |
|---|
| Blurry motion | Add "sharp motion capture, 1/1000s shutter freeze" or "slow-motion 240fps" |
| Character inconsistency | Use Ingredients to Video with reference images; anchor with unique clothing/jewelry |
| Unwanted morphing | Specify rigidity: "maintains solid form", "no deformation"; use negative prompting |
| Physics look artificial | Explicitly state "realistic physics", "accurate weight and momentum" |
| Audio out of sync | Use timestamp markers: "dialogue starts at 2s", "sound matches impact at 4s" |
| Flat lighting | Specify contrast: "dramatic side lighting", "deep shadows", "rim light separation" |
| Boring composition | Lead with dynamic camera: "whip pan", "aggressive tracking shot", "rapid dolly in" |
When to Load References
| Task | Resource |
|---|
| Camera movement vocabulary | |
| Visual styles and color grading | |
| Genre-specific templates | |
| Audio generation techniques | |
| Copy-paste templates | |
Camera Techniques for Veo 3.1
Veo 3.1 excels at film grammar and camera movement. Precise terminology yields better motion control.
Movement Types
Linear Motion
| Term | Description | Best For |
|---|
| Physical camera movement toward/away subject | Emotional reveals, focus shifts |
| Camera moves parallel to subject | Following subjects, dynamic action |
| Horizontal lateral movement | Revealing environments |
| Vertical movement without angle change | Elevating reveals |
| Sweeping vertical arc | Epic scale reveals |
| Zoom while maintaining perspective | Intensifying moments |
Rotational Motion
| Term | Description | Best For |
|---|
| Horizontal rotation, fixed position | Scanning landscapes, following movement |
| Vertical rotation, fixed position | Revealing height, vertical subjects |
Orbit clockwise/counter-clockwise
| 360° circular path around subject | Product showcases, dramatic encirclement |
| Tilted horizon | Disorientation, tension, stylized action |
Dynamic Techniques
| Term | Description | Best For |
|---|
| Slight organic jitter | Documentary realism, urgency |
| Stabilized fluid motion | Cinematic tracking, luxury aesthetic |
| Fast blur transition between subjects | Energy, scene transitions |
| Shifts focal plane between foreground/background | Narrative focus shifts, depth emphasis |
| Gradual camera advance | Contemplative, intimate moments |
Shot Framing Hierarchy
Prefix camera moves with framing:
- — Detail only (eyes, texture)
- — Head and shoulders
- — Waist up
- — Full subject with environment
Extreme wide/Establishing
— Environment dominant
- — Dialogue scenes
- — Immersive perspective
- — Top-down, detachment
- — Looking up, empowerment/intimidation
Speed and Time Modifiers
- — Smooth, detailed action
- — Natural motion
- — Compressed time, clouds, construction
- — Moving time-lapse
- — Uneven, mechanical, horror
Lens Characteristics
Include to control optical personality:
- — Horizontal flares, oval bokeh, cinematic width
- — Distortion, expansive space
- — Compressed space, portrait perspective
- — Extreme close focus, shallow depth
- — Spherical distortion, extreme wide
- — Softness, chromatic aberration, character
Example Combinations
Extreme close-up of eye, slow push-in, rack focus to reflection in iris, anamorphic flares
Wide establishing shot, crane down through clouds to reveal mountain valley, golden hour lighting
Handheld medium shot following subject through crowded market, whip pan to street sign