media-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

varg sdk - declarative ai video orchestration

varg SDK - 声明式AI视频编排

jsx-based ai video generation. describe scenes declaratively, render videos automatically.
基于JSX的AI视频生成。以声明式方式描述场景,自动渲染视频。

what is this?

这是什么?

varg sdk is a declarative video orchestration framework. instead of manually calling apis, stitching clips, and managing async workflows, you describe what you want in jsx and the runtime handles:
  • parallel generation of images/videos/audio
  • automatic caching (re-renders reuse cached assets)
  • ffmpeg composition under the hood
  • provider abstraction (fal, elevenlabs, replicate)
think of it like react for video - you declare the structure, the engine figures out the execution.
varg SDK是一个声明式视频编排框架。无需手动调用API、拼接剪辑片段和管理异步工作流,只需用JSX描述你想要的内容,运行时会自动处理以下工作:
  • 并行生成图片/视频/音频
  • 自动缓存(重新渲染时复用缓存资源)
  • 底层基于FFmpeg进行合成
  • 抽象化服务提供商(支持fal、elevenlabs、replicate)
可以把它想象成视频领域的React——你只需声明结构,引擎会负责执行细节。

terminology

术语说明

termmeaning
Renderroot container - sets dimensions (1080x1920 for tiktok, 1920x1080 for youtube)
Cliptimeline segment with duration, contains visual/audio layers
Imagestatic image - generated from prompt or loaded from file
Videovideo clip - text-to-video OR image-to-video animation
Musicbackground audio - generated from prompt or loaded from file
Speechtext-to-speech with voice selection
Title/Subtitletext overlays with positioning
Captionsauto-generated captions from Speech element
Gridlayout helper for multi-image/video grids
术语含义
Render根容器 - 设置尺寸(TikTok为1080x1920,YouTube为1920x1080)
Clip时间轴片段,包含时长和音视频图层
Image静态图片 - 通过提示词生成或从文件加载
Video视频片段 - 支持文本生成视频或图片转视频动画
Music背景音频 - 通过提示词生成或从文件加载
Speech带音色选择的文本转语音功能
Title/Subtitle可定位的文本叠加层
Captions从Speech元素自动生成的字幕
Grid用于多图/多视频网格布局的辅助组件

core concepts

核心概念

tsx
<Render width={1080} height={1920}>  {/* tiktok dimensions */}
  <Music prompt="upbeat electronic" duration={10} />
  
  <Clip duration={3}>
    <Image prompt="cyberpunk cityscape" />
    <Title position="bottom">Welcome</Title>
  </Clip>
  
  <Clip duration={5}>
    <Video prompt="camera flies through neon streets" />
  </Clip>
</Render>
no imports needed - the render runtime auto-imports all components (
Render
,
Clip
,
Video
,
Image
,
Music
,
Speech
,
Title
,
Grid
, etc.) and providers (
fal
,
elevenlabs
). just write jsx and export default.
tsx
<Render width={1080} height={1920}>  {/* TikTok尺寸 */}
  <Music prompt="upbeat electronic" duration={10} />
  
  <Clip duration={3}>
    <Image prompt="cyberpunk cityscape" />
    <Title position="bottom">Welcome</Title>
  </Clip>
  
  <Clip duration={5}>
    <Video prompt="camera flies through neon streets" />
  </Clip>
</Render>
无需导入 - 渲染运行时会自动导入所有组件(
Render
Clip
Video
Image
Music
Speech
Title
Grid
等)和服务提供商(
fal
elevenlabs
)。只需编写JSX并导出默认内容即可。

Video component

Video组件

Video
handles both text-to-video and image-to-video:
tsx
// text-to-video - generate from scratch
<Video prompt="cat playing piano in a jazz club" />

// image-to-video - animate an image with motion description
<Video 
  prompt={{
    text: "eyes widen, head tilts forward, subtle smile forming",
    images: [<Image prompt="portrait of woman" />]
  }}
/>
Video
组件同时支持文本生成视频和图片转视频:
tsx
// 文本生成视频 - 从头开始生成
<Video prompt="cat playing piano in a jazz club" />

// 图片转视频 - 用动作描述让图片动起来
<Video 
  prompt={{
    text: "eyes widen, head tilts forward, subtle smile forming",
    images: [<Image prompt="portrait of woman" />]
  }}
/>

character consistency

角色一致性

generate an image first, pass it to Video/Animate for consistent characters across scenes:
tsx
const hero = <Image prompt="warrior princess, crimson hair, emerald eyes" />;

<Clip><Video image={hero} prompt="she draws her sword" /></Clip>
<Clip><Video image={hero} prompt="she leaps into battle" /></Clip>
for reference-based generation (keeping a real person's likeness):
tsx
<Image 
  prompt={{ 
    text: "same person, different scene", 
    images: ["https://reference-photo.jpg"] 
  }}
  model={fal.imageModel("nano-banana-pro/edit")}
/>
先生成一张图片,将其传入Video/Animate组件,即可在不同场景中保持角色一致性:
tsx
const hero = <Image prompt="warrior princess, crimson hair, emerald eyes" />;

<Clip><Video image={hero} prompt="she draws her sword" /></Clip>
<Clip><Video image={hero} prompt="she leaps into battle" /></Clip>
如需基于参考图生成(保留真实人物的样貌):
tsx
<Image 
  prompt={{ 
    text: "same person, different scene", 
    images: ["https://reference-photo.jpg"] 
  }}
  model={fal.imageModel("nano-banana-pro/edit")}
/>

prompting guide (CRITICAL)

提示词指南(至关重要)

detailed prompts = better results. wan 2.5 video model responds dramatically better to rich, structured prompts. vague prompts produce generic, low-quality output.
提示词越详细,结果越好。 Wan 2.5视频模型对丰富、结构化的提示词响应效果显著提升。模糊的提示词会产生通用、低质量的输出。

the 4-dimensional formula

四维公式

every video prompt should include these dimensions:
[Subject Description] + [Scene Description] + [Motion/Action] + [Cinematic Controls]
每个视频提示词都应包含以下维度:
[主体描述] + [场景描述] + [动作/运动] + [电影化控制]

subject description

主体描述

describe the main focus with rich detail:
  • appearance, clothing, features, posture
  • materials, textures, colors
  • emotional state, expression
bad:
"a woman"
good:
"a young woman with shoulder-length auburn hair, wearing a worn leather jacket over a white t-shirt, her green eyes reflecting determination, jaw set with quiet resolve"
用丰富细节描述核心焦点:
  • 外貌、服装、特征、姿态
  • 材质、纹理、颜色
  • 情绪状态、表情
糟糕示例:
"a woman"
优秀示例:
"a young woman with shoulder-length auburn hair, wearing a worn leather jacket over a white t-shirt, her green eyes reflecting determination, jaw set with quiet resolve"

scene description

场景描述

layer the environment details:
  • location, background, foreground
  • lighting type (soft light, hard light, edge light, top light)
  • time period (golden hour, daytime, night, moonlight)
  • atmosphere (warm tones, low saturation, high contrast)
bad:
"in a city"
good:
"on a rain-slicked tokyo street at night, neon signs reflecting pink and blue on wet pavement, steam rising from a nearby food stall, warm yellow light spilling from a ramen shop window"
叠加环境细节:
  • 位置、背景、前景
  • 光线类型(柔光、硬光、轮廓光、顶光)
  • 时间段(黄金时刻、白天、夜晚、月光)
  • 氛围(暖色调、低饱和度、高对比度)
糟糕示例:
"in a city"
优秀示例:
"on a rain-slicked tokyo street at night, neon signs reflecting pink and blue on wet pavement, steam rising from a nearby food stall, warm yellow light spilling from a ramen shop window"

motion/action description

动作/运动描述

specify movement with intensity and manner:
  • amplitude (small movements, large movements)
  • rate (slowly, quickly, explosively)
  • effects (breaking glass, hair flowing, dust particles)
bad:
"walking"
good:
"walking slowly through the crowd, shoulders hunched against the rain, one hand raised to shield her eyes, water droplets catching the neon light as they fall from her fingertips"
明确运动的强度和方式:
  • 幅度(小动作、大动作)
  • 速度(缓慢、快速、爆发式)
  • 效果(玻璃破碎、头发飘动、尘埃颗粒)
糟糕示例:
"walking"
优秀示例:
"walking slowly through the crowd, shoulders hunched against the rain, one hand raised to shield her eyes, water droplets catching the neon light as they fall from her fingertips"

cinematic controls

电影化控制

use film language for professional results:
camera movements:
  • camera push-in
    /
    camera pull-out
  • tracking shot
    /
    dolly shot
  • pan left/right
    /
    tilt up/down
  • static camera
    /
    fixed camera
  • handheld
    /
    steadicam
shot types:
  • extreme close-up
    /
    close-up
    /
    medium close-up
  • medium shot
    /
    medium wide shot
  • wide shot
    /
    extreme wide shot
  • over-the-shoulder shot
composition:
  • center composition
    /
    rule of thirds
  • left-side composition
    /
    right-side composition
  • symmetrical composition
  • low angle
    /
    high angle
    /
    eye-level
lighting keywords:
  • soft light
    /
    hard light
    /
    edge light
  • rim light
    /
    backlight
    /
    side light
  • golden hour
    /
    blue hour
    /
    moonlight
  • practical light
    /
    mixed light
使用影视术语获得专业效果:
镜头运动:
  • camera push-in
    /
    camera pull-out
    (镜头推进/拉远)
  • tracking shot
    /
    dolly shot
    (跟拍/移拍)
  • pan left/right
    /
    tilt up/down
    (左右摇镜/上下俯仰)
  • static camera
    /
    fixed camera
    (固定镜头)
  • handheld
    /
    steadicam
    (手持镜头/稳定器镜头)
景别:
  • extreme close-up
    /
    close-up
    /
    medium close-up
    (大特写/特写/中近景)
  • medium shot
    /
    medium wide shot
    (中景/中远景)
  • wide shot
    /
    extreme wide shot
    (远景/大远景)
  • over-the-shoulder shot
    (过肩镜头)
构图:
  • center composition
    /
    rule of thirds
    (中心构图/三分法构图)
  • left-side composition
    /
    right-side composition
    (左侧构图/右侧构图)
  • symmetrical composition
    (对称构图)
  • low angle
    /
    high angle
    /
    eye-level
    (低角度/高角度/平视)
光线关键词:
  • soft light
    /
    hard light
    /
    edge light
    (柔光/硬光/轮廓光)
  • rim light
    /
    backlight
    /
    side light
    (边缘光/背光/侧光)
  • golden hour
    /
    blue hour
    /
    moonlight
    (黄金时刻/蓝色时刻/月光)
  • practical light
    /
    mixed light
    (实景光/混合光)

style keywords

风格关键词

add visual style for artistic direction:
  • cinematic
    /
    film grain
    /
    anamorphic
  • cyberpunk
    /
    noir
    /
    post-apocalyptic
  • ghibli style
    /
    anime
    /
    photorealistic
  • vintage
    /
    retro
    /
    futuristic
  • tilt-shift photography
    /
    time-lapse
添加视觉风格以实现艺术导向:
  • cinematic
    /
    film grain
    /
    anamorphic
    (电影感/胶片颗粒/变形镜头)
  • cyberpunk
    /
    noir
    /
    post-apocalyptic
    (赛博朋克/黑色电影/后末日)
  • ghibli style
    /
    anime
    /
    photorealistic
    (吉卜力风格/动漫/照片级写实)
  • vintage
    /
    retro
    /
    futuristic
    (复古/怀旧/未来主义)
  • tilt-shift photography
    /
    time-lapse
    (移轴摄影/延时摄影)

audio/dialogue (for videos with speech)

音频/对话(适用于带语音的视频)

wan 2.5 supports native audio generation:
  • format dialogue:
    Character says: "Your line here."
  • keep lines short (under 10 words per 5-second clip) for best lip-sync
  • specify delivery tone:
    "speaking quietly"
    ,
    "calling out over wind"
  • for silence: include
    "no dialogue"
    or
    "actors not speaking"
Wan 2.5支持原生音频生成:
  • 对话格式:
    Character says: "Your line here."
  • 为获得最佳唇同步效果,每5秒剪辑的台词应控制在10字以内
  • 指定语气:
    "speaking quietly"
    "calling out over wind"
  • 如需静音:加入
    "no dialogue"
    "actors not speaking"

image-to-video prompts

图片转视频提示词

when using reference images, focus on motion + camera:
  • the image already defines subject/scene/style
  • describe dynamics:
    "running forward, hair flowing behind"
  • add camera movement:
    "slow push-in on face"
  • control intensity:
    "subtle movement"
    vs
    "dramatic action"
使用参考图时,重点描述运动和镜头:
  • 图片已定义主体/场景/风格
  • 描述动态效果:
    "running forward, hair flowing behind"
  • 添加镜头运动:
    "slow push-in on face"
  • 控制强度:
    "subtle movement"
    (轻微运动) vs
    "dramatic action"
    (剧烈动作)

example: detailed prompt

示例:详细提示词

bad prompt (vague):
tsx
<Video prompt="a warrior fighting" />
good prompt (detailed):
tsx
<Video prompt="medium close-up, side light, golden hour. a battle-hardened samurai with a weathered face and grey-streaked topknot, wearing damaged traditional armor with visible scratches and dents. he raises his katana in a fluid motion, blade catching the dying sunlight, muscles tensing as he shifts into fighting stance. dust particles float in the amber light, his breath visible in the cold air. slow camera push-in on his focused expression, eyes narrowed with lethal intent. cinematic, film grain, anamorphic lens flare." />
糟糕提示词(模糊):
tsx
<Video prompt="a warrior fighting" />
优秀提示词(详细):
tsx
<Video prompt="medium close-up, side light, golden hour. a battle-hardened samurai with a weathered face and grey-streaked topknot, wearing damaged traditional armor with visible scratches and dents. he raises his katana in a fluid motion, blade catching the dying sunlight, muscles tensing as he shifts into fighting stance. dust particles float in the amber light, his breath visible in the cold air. slow camera push-in on his focused expression, eyes narrowed with lethal intent. cinematic, film grain, anamorphic lens flare." />

example 1: cinematic character video

示例1:电影级角色视频

consistent character across multiple dramatic scenes with epic music.
warrior-princess.tsx (complete file - no imports needed):
tsx
// Warrior Princess - Cinematic Character Video
// run: bunx vargai@latest render warrior-princess.tsx --verbose

const hero = (
  <Image prompt="portrait, soft light, center composition. warrior princess with flowing crimson hair reaching past her shoulders, piercing emerald eyes reflecting inner fire, wearing battle-worn silver armor with intricate celtic knot engravings, a thin scar across her left cheek from an old battle, expression of quiet determination mixed with weariness. ghibli style, painterly brushstrokes, warm color palette with golden undertones" />
);

export default (
  <Render width={1080} height={1920}>
    <Clip duration={4}>
      <Video 
        image={hero} 
        prompt="wide shot, golden hour, dramatic edge lighting from behind. she stands at the edge of a windswept cliff overlooking a vast misty valley with ancient ruins below, her crimson hair and tattered cape billowing dramatically in the wind, one hand resting on sword hilt at her side. slow camera pull-out revealing the epic landscape, dust particles catching the amber sunlight. cinematic, ghibli style, film grain, anamorphic lens" 
      />
    </Clip>
    <Clip duration={3}>
      <Video 
        image={hero} 
        prompt="medium close-up, dramatic side lighting with hard shadows. she draws her sword in one fluid motion, blade singing as it leaves the scabbard, polished steel catching the dying sunlight with a bright gleam. her eyes narrow with fierce focus, jaw set with lethal intent. camera tracks the sword movement upward. cinematic, ghibli style, shallow depth of field" 
      />
    </Clip>
    <Clip duration={3}>
      <Video 
        image={hero} 
        prompt="extreme close-up on her face, rim light from behind creating a golden halo effect. a single tear rolls down her scarred cheek as she whispers a prayer, eyes closed in concentration, lips barely moving. subtle camera push-in, dust motes floating in the light. emotional, ghibli style, soft focus on background" 
      />
    </Clip>
    <Music prompt="epic orchestral score, soaring strings building tension, triumphant french horns, taiko drums in the distance, celtic flute melody, building from quiet contemplation to heroic crescendo" duration={12} />
  </Render>
);
在多个戏剧化场景中保持角色一致性,搭配史诗级音乐。
warrior-princess.tsx(完整文件 - 无需导入):
tsx
// Warrior Princess - 电影级角色视频
// 运行命令:bunx vargai@latest render warrior-princess.tsx --verbose

const hero = (
  <Image prompt="portrait, soft light, center composition. warrior princess with flowing crimson hair reaching past her shoulders, piercing emerald eyes reflecting inner fire, wearing battle-worn silver armor with intricate celtic knot engravings, a thin scar across her left cheek from an old battle, expression of quiet determination mixed with weariness. ghibli style, painterly brushstrokes, warm color palette with golden undertones" />
);

export default (
  <Render width={1080} height={1920}>
    <Clip duration={4}>
      <Video 
        image={hero} 
        prompt="wide shot, golden hour, dramatic edge lighting from behind. she stands at the edge of a windswept cliff overlooking a vast misty valley with ancient ruins below, her crimson hair and tattered cape billowing dramatically in the wind, one hand resting on sword hilt at her side. slow camera pull-out revealing the epic landscape, dust particles catching the amber sunlight. cinematic, ghibli style, film grain, anamorphic lens" 
      />
    </Clip>
    <Clip duration={3}>
      <Video 
        image={hero} 
        prompt="medium close-up, dramatic side lighting with hard shadows. she draws her sword in one fluid motion, blade singing as it leaves the scabbard, polished steel catching the dying sunlight with a bright gleam. her eyes narrow with fierce focus, jaw set with lethal intent. camera tracks the sword movement upward. cinematic, ghibli style, shallow depth of field" 
      />
    </Clip>
    <Clip duration={3}>
      <Video 
        image={hero} 
        prompt="extreme close-up on her face, rim light from behind creating a golden halo effect. a single tear rolls down her scarred cheek as she whispers a prayer, eyes closed in concentration, lips barely moving. subtle camera push-in, dust motes floating in the light. emotional, ghibli style, soft focus on background" 
      />
    </Clip>
    <Music prompt="epic orchestral score, soaring strings building tension, triumphant french horns, taiko drums in the distance, celtic flute melody, building from quiet contemplation to heroic crescendo" duration={12} />
  </Render>
);

example 2: tiktok product video with talking head

示例2:带虚拟主播的TikTok产品视频

animated influencer character with speech, captions, and background music.
skincare-promo.tsx (complete file - no imports needed):
tsx
// Skincare Product TikTok - Talking Head with Captions
// run: bunx vargai@latest render skincare-promo.tsx --verbose

const influencer = (
  <Image 
    prompt={{ 
      text: "extreme close-up face shot, ring light reflection in eyes, surprised expression with wide eyes and raised eyebrows, mouth slightly open in excitement, looking directly at camera. young woman with short platinum blonde hair, minimal makeup, silver hoop earrings, wearing oversized vintage band t-shirt. clean white background, professional studio lighting, tiktok creator aesthetic",
      images: ["https://reference-photo-url.jpg"]  // optional: reference for likeness
    }}
    model={fal.imageModel("nano-banana-pro/edit")}
    aspectRatio="9:16"
  />
);

const speech = (
  <Speech voice="bella" model={elevenlabs.speechModel("turbo")}>
    Oh my god you guys, I literally cannot believe this actually works! I've been using this for like two weeks and my skin has never looked better. Link in bio, seriously go get it!
  </Speech>
);

export default (
  <Render width={1080} height={1920}>
    <Music 
      prompt="upbeat electronic pop, energetic female vocal chops, modern tiktok trending sound, catchy synth melody, punchy drums, high energy throughout" 
      duration={12} 
      volume={0.4}
    />
    
    <Clip duration={3}>
      <Video 
        prompt={{
          text: "eyes widen dramatically in genuine surprise, eyebrows shoot up, mouth opens into excited smile, subtle forward lean toward camera as if sharing a secret. natural blinking, authentic micro-expressions",
          images: [influencer]
        }}
        model={fal.videoModel("wan-2.5")}
      />
      {speech}
    </Clip>
    
    <Clip duration={4}>
      <Video 
        prompt="product shot: sleek skincare bottle rotating slowly on marble surface, soft key light from above, pink and gold gradient background, light rays catching the glass, water droplets on bottle suggesting freshness. smooth 360 rotation, professional product photography, beauty commercial aesthetic"
        model={fal.videoModel("kling-v2.5")}
      />
      <Title position="bottom" color="#ffffff">LINK IN BIO</Title>
    </Clip>
    
    <Clip duration={3}>
      <Video 
        prompt={{
          text: "enthusiastic nodding while speaking, hands come into frame gesturing excitedly, genuine smile reaching her eyes, occasional hair flip, pointing at camera on 'go get it'",
          images: [influencer]
        }}
        model={fal.videoModel("wan-2.5")}
      />
    </Clip>
    
    <Captions src={speech} style="tiktok" activeColor="#ff00ff" />
  </Render>
);
带有语音、字幕和背景音乐的动画网红角色。
skincare-promo.tsx(完整文件 - 无需导入):
tsx
// Skincare Product TikTok - 带字幕的虚拟主播视频
// 运行命令:bunx vargai@latest render skincare-promo.tsx --verbose

const influencer = (
  <Image 
    prompt={{ 
      text: "extreme close-up face shot, ring light reflection in eyes, surprised expression with wide eyes and raised eyebrows, mouth slightly open in excitement, looking directly at camera. young woman with short platinum blonde hair, minimal makeup, silver hoop earrings, wearing oversized vintage band t-shirt. clean white background, professional studio lighting, tiktok creator aesthetic",
      images: ["https://reference-photo-url.jpg"]  // 可选:用于保留样貌的参考图
    }}
    model={fal.imageModel("nano-banana-pro/edit")}
    aspectRatio="9:16"
  />
);

const speech = (
  <Speech voice="bella" model={elevenlabs.speechModel("turbo")}>
    Oh my god you guys, I literally cannot believe this actually works! I've been using this for like two weeks and my skin has never looked better. Link in bio, seriously go get it!
  </Speech>
);

export default (
  <Render width={1080} height={1920}>
    <Music 
      prompt="upbeat electronic pop, energetic female vocal chops, modern tiktok trending sound, catchy synth melody, punchy drums, high energy throughout" 
      duration={12} 
      volume={0.4}
    />
    
    <Clip duration={3}>
      <Video 
        prompt={{
          text: "eyes widen dramatically in genuine surprise, eyebrows shoot up, mouth opens into excited smile, subtle forward lean toward camera as if sharing a secret. natural blinking, authentic micro-expressions",
          images: [influencer]
        }}
        model={fal.videoModel("wan-2.5")}
      />
      {speech}
    </Clip>
    
    <Clip duration={4}>
      <Video 
        prompt="product shot: sleek skincare bottle rotating slowly on marble surface, soft key light from above, pink and gold gradient background, light rays catching the glass, water droplets on bottle suggesting freshness. smooth 360 rotation, professional product photography, beauty commercial aesthetic"
        model={fal.videoModel("kling-v2.5")}
      />
      <Title position="bottom" color="#ffffff">LINK IN BIO</Title>
    </Clip>
    
    <Clip duration={3}>
      <Video 
        prompt={{
          text: "enthusiastic nodding while speaking, hands come into frame gesturing excitedly, genuine smile reaching her eyes, occasional hair flip, pointing at camera on 'go get it'",
          images: [influencer]
        }}
        model={fal.videoModel("wan-2.5")}
      />
    </Clip>
    
    <Captions src={speech} style="tiktok" activeColor="#ff00ff" />
  </Render>
);

example 3: multi-scene video grid with elements

示例3:多场景视频网格

4-panel nature video grid showcasing different elements.
four-elements.tsx (complete file - no imports needed):
tsx
// Four Elements - 2x2 Video Grid
// run: bunx vargai@latest render four-elements.tsx --verbose

export default (
  <Render width={1920} height={1080}>
    <Clip duration={6}>
      <Grid columns={2}>
        <Video 
          prompt="extreme close-up of ocean wave crashing against volcanic black rocks, slow motion 120fps, crystal clear turquoise water exploding into white foam spray, individual water droplets suspended in golden hour sunlight, mist rising. cinematic, national geographic style, shallow depth of field on water texture"
          model={fal.videoModel("wan-2.5")}
        />
        <Video 
          prompt="intimate close-up of flames dancing in stone fireplace, warm orange and yellow tongues of fire licking weathered logs, glowing embers pulsing beneath, sparks occasionally rising. cozy cabin atmosphere, soft crackling implied, shallow focus on flame tips, hygge aesthetic"
          model={fal.videoModel("wan-2.5")}
        />
        <Video 
          prompt="macro shot of rain droplets hitting window glass, each drop creating expanding ripples and trails, city lights blurred into colorful bokeh beyond the glass, melancholic blue hour lighting. asmr visual aesthetic, slow motion, focus pulling between drops"
          model={fal.videoModel("wan-2.5")}
        />
        <Video 
          prompt="timelapse of cumulus clouds drifting across deep blue sky, dramatic god rays breaking through cloud gaps, cloud shadows moving across distant mountain range. epic landscape, wide angle, peaceful meditative mood, nature documentary cinematography"
          model={fal.videoModel("wan-2.5")}
        />
      </Grid>
      <Title position="bottom" color="#ffffff">THE FOUR ELEMENTS</Title>
    </Clip>
    
    <Music prompt="ambient electronic, peaceful synthesizer pads, gentle nature sounds mixed in, meditative atmosphere, spa music vibes but more cinematic, slow build with subtle percussion" duration={8} />
  </Render>
);
2x2网格布局的自然主题视频,展示四大元素。
four-elements.tsx(完整文件 - 无需导入):
tsx
// Four Elements - 2x2视频网格
// 运行命令:bunx vargai@latest render four-elements.tsx --verbose

export default (
  <Render width={1920} height={1080}>
    <Clip duration={6}>
      <Grid columns={2}>
        <Video 
          prompt="extreme close-up of ocean wave crashing against volcanic black rocks, slow motion 120fps, crystal clear turquoise water exploding into white foam spray, individual water droplets suspended in golden hour sunlight, mist rising. cinematic, national geographic style, shallow depth of field on water texture"
          model={fal.videoModel("wan-2.5")}
        />
        <Video 
          prompt="intimate close-up of flames dancing in stone fireplace, warm orange and yellow tongues of fire licking weathered logs, glowing embers pulsing beneath, sparks occasionally rising. cozy cabin atmosphere, soft crackling implied, shallow focus on flame tips, hygge aesthetic"
          model={fal.videoModel("wan-2.5")}
        />
        <Video 
          prompt="macro shot of rain droplets hitting window glass, each drop creating expanding ripples and trails, city lights blurred into colorful bokeh beyond the glass, melancholic blue hour lighting. asmr visual aesthetic, slow motion, focus pulling between drops"
          model={fal.videoModel("wan-2.5")}
        />
        <Video 
          prompt="timelapse of cumulus clouds drifting across deep blue sky, dramatic god rays breaking through cloud gaps, cloud shadows moving across distant mountain range. epic landscape, wide angle, peaceful meditative mood, nature documentary cinematography"
          model={fal.videoModel("wan-2.5")}
        />
      </Grid>
      <Title position="bottom" color="#ffffff">THE FOUR ELEMENTS</Title>
    </Clip>
    
    <Music prompt="ambient electronic, peaceful synthesizer pads, gentle nature sounds mixed in, meditative atmosphere, spa music vibes but more cinematic, slow build with subtle percussion" duration={8} />
  </Render>
);

render

渲染视频

preview first (recommended workflow)

先预览(推荐工作流)

use
--preview
to generate only images/thumbnails without rendering full videos. this is faster and cheaper for iteration:
bash
bunx vargai@latest render video.tsx --preview --verbose
once you're happy with the preview frames, render the full video:
bash
bunx vargai@latest render video.tsx --verbose
使用
--preview
参数仅生成图片/缩略图,不渲染完整视频。这种方式更快、成本更低,适合迭代调整:
bash
bunx vargai@latest render video.tsx --preview --verbose
当你对预览帧满意后,再渲染完整视频:
bash
bunx vargai@latest render video.tsx --verbose

open the result

打开结果

after rendering, open the video with the system default player:
bash
open output/video.mp4
渲染完成后,使用系统默认播放器打开视频:
bash
open output/video.mp4

setup

环境配置

requires api keys in
.env
:
需要在
.env
文件中配置API密钥:

available models

可用模型

tsx
// image generation
fal.imageModel("flux-schnell")        // fast, good quality
fal.imageModel("flux-pro")            // highest quality
fal.imageModel("nano-banana-pro/edit") // reference-based (character consistency)

// video generation
fal.videoModel("wan-2.5")             // balanced speed/quality
fal.videoModel("kling-v2.5")          // highest quality, 10s max
fal.videoModel("sync-v2")             // lipsync

// audio
elevenlabs.speechModel("turbo")       // fast TTS
elevenlabs.musicModel()               // music generation
tsx
// 图片生成
fal.imageModel("flux-schnell")        // 快速生成,质量优秀
fal.imageModel("flux-pro")            // 最高质量
fal.imageModel("nano-banana-pro/edit") // 基于参考图生成(保持角色一致性)

// 视频生成
fal.videoModel("wan-2.5")             // 速度与质量平衡
fal.videoModel("kling-v2.5")          // 最高质量,最长10秒
fal.videoModel("sync-v2")             // 唇同步

// 音频
elevenlabs.speechModel("turbo")       // 快速文本转语音
elevenlabs.musicModel()               // 音乐生成

key props reference

核心属性参考

componentkey props
<Render>
width
,
height
,
fps
<Clip>
duration
,
transition={{ name: "fade", duration: 0.5 }}
<Image>
prompt
,
src
,
model
,
aspectRatio
,
resize
,
zoom
<Video>
prompt
,
src
,
model
,
duration
,
cutFrom
,
cutTo
<Music>
prompt
,
src
,
model
,
duration
,
volume
,
loop
<Speech>
voice
,
model
,
volume
, children=text
<Title>
position
,
color
, children=text
<Captions>
src
(Speech element),
style
("tiktok"/"karaoke")
<Grid>
columns
,
rows
, children=Image/Video array
组件核心属性
<Render>
width
,
height
,
fps
<Clip>
duration
,
transition={{ name: "fade", duration: 0.5 }}
<Image>
prompt
,
src
,
model
,
aspectRatio
,
resize
,
zoom
<Video>
prompt
,
src
,
model
,
duration
,
cutFrom
,
cutTo
<Music>
prompt
,
src
,
model
,
duration
,
volume
,
loop
<Speech>
voice
,
model
,
volume
, children=文本内容
<Title>
position
,
color
, children=文本内容
<Captions>
src
(Speech元素),
style
("tiktok"/"karaoke")
<Grid>
columns
,
rows
, children=图片/视频数组

about varg.ai

关于varg.ai

varg is a generative ai video platform. the sdk uses declarative jsx to compose video scenes with ai-generated images, animations, and audio. key features:
  • automatic caching (regenerating uses cached assets)
  • parallel generation (clips render concurrently)
  • character consistency via image references
  • provider-agnostic (fal, replicate, elevenlabs under the hood)
  • ffmpeg-based composition for professional output
varg是一个生成式AI视频平台。该SDK使用声明式JSX来组合由AI生成的图片、动画和音频,从而创建视频场景。核心特性:
  • 自动缓存(重新生成时复用缓存资源)
  • 并行生成(剪辑片段同时渲染)
  • 通过图片参考保持角色一致性
  • 服务提供商无关(底层支持fal、replicate、elevenlabs)
  • 基于FFmpeg的合成,输出专业级视频