dream-to-video

Original🇨🇳 Chinese
Translated

Triggered when users provide dream text materials, diary fragments, or oral dream descriptions and wish to generate videos. Trigger phrases include: "dreamt of", "had a dream", "dream material", "help me generate a video", "convert to video", "dream to video". It also applies to scenarios where users directly paste a dream description and expect to receive a video file. This skill converts text into video prompts, automatically submits them to the Jiemeng Platform for generation, and downloads the video files.

6installs
Added on

NPX Install

npx skill4agent add mediastormdev/dream-to-video-skill dream-to-video

SKILL.md Content (Chinese)

View Translation Comparison →

Dream-to-Video: Automated Video Generation from Dream Materials

You are responsible for converting the dream text materials provided by users into video prompts, submitting them to the Jiemeng Platform via an automated toolchain for video generation, and downloading the videos locally. Users only need to provide the text, and you will finally return the video files.

0. Initial Environment Configuration

When a user uses this Skill for the first time, automatically configure the environment according to the following process. Check the result after each step, and inform the user that they can start using it only after all steps are completed.

Path Conventions

In this document,
{W}
represents the clone path of this repository (i.e., the parent directory of the
dream_to_video/
directory).
  • Project directory:
    {W}/dream_to_video/
  • Skill resource directory:
    {W}/skills/dream-to-video/
Replace
{W}
with the actual path when executing commands.

Automatic Execution (No User Action Required)

Step 0-0: Clone the Project Repository Ask the user where they want to place the project, then execute:
bash
cd "<directory specified by user>" && git clone https://github.com/mediastormDev/dream-to-video-skill.git && cd dream-to-video-skill
The
dream-to-video-skill
directory after cloning is
{W}
. If the user says "in the current directory", clone it in the current directory.
Step 0-1: Check Python
bash
python --version
Requires Python ≥ 3.10. If it is not installed or the version is too low, stop and prompt the user:
Please install Python 3.10 or higher first: https://www.python.org/downloads/ Check "Add Python to PATH" during installation.
Step 0-2: Install Python Dependencies
bash
cd "{W}/dream_to_video" && pip install -r requirements.txt
Step 0-3: Install Playwright Browser
bash
playwright install chromium
This will download the Chromium browser engine (about 150MB), which is used to automatically control the Jiemeng Platform.
Step 0-4: Create Necessary Directories
bash
cd "{W}/dream_to_video" && mkdir -p output data auth/browser_profile reference_images/indoor reference_images/outdoor
Step 0-5: Deploy Reference Image Materials The built-in reference images in the repository are located at
skills/dream-to-video/reference_images/
, automatically copy them to the project runtime directory:
bash
cp -n "{W}/skills/dream-to-video/reference_images/indoor/"*.jpg "{W}/dream_to_video/reference_images/indoor/" 2>/dev/null; cp -n "{W}/skills/dream-to-video/reference_images/outdoor/"*.jpg "{W}/dream_to_video/reference_images/outdoor/" 2>/dev/null; echo "reference images deployed"
-n
does not overwrite existing files. If the user has their own company environment photos, they can be placed in the corresponding directory additionally.

User Action Required

Step 0-6: Login to Jiemeng Platform
bash
cd "{W}/dream_to_video" && python main.py login
After execution, the browser will open the Jiemeng website and display a login QR code. Prompt the user:
Please scan the QR code in the browser with your TikTok/Jiemeng App to complete login. After successful login, the program will automatically save the credentials, and no repeated login is required in the future.

Self-Test Checklist

After completing all steps, run the following checks in sequence. Configuration is successful only if all are ✅:
bash
# Check 1: Python Version
python --version
# Expected: Python 3.10+

# Check 2: Core Dependencies
python -c "import playwright; import cv2; import numpy; print('deps OK')"
# Expected: Output deps OK

# Check 3: Playwright Browser
python -c "from playwright.sync_api import sync_playwright; b=sync_playwright().start(); br=b.chromium.launch(headless=True); br.close(); b.stop(); print('browser OK')"
# Expected: Output browser OK

# Check 4: Directory Structure
python -c "from pathlib import Path; dirs=['output','data','auth/browser_profile']; ok=all((Path('{W}/dream_to_video')/d).is_dir() for d in dirs); print('dirs OK' if ok else 'dirs MISSING')"
# Expected: Output dirs OK

# Check 5: Login Status
cd "{W}/dream_to_video" && python main.py verify
# Expected: Display login is valid
After all checks pass, inform the user:
Environment configuration completed! You can now provide dream materials directly, and I will automatically generate the video.
If any item fails, prompt the user how to fix it specifically, and do not proceed with subsequent steps.

1. Complete Workflow

User provides dream materials → You convert to Prompt according to rules → Submit to queue → Worker automatically generates + downloads → Notify user

After receiving materials each time, you must follow these steps:

Step 1: Convert to Prompt Convert the user's dream materials into video prompts according to "II. Video Prompt Generation Rules" below. Output the complete Prompt text directly.
  • Check Rule 9 (Character Appearance Tagging): Are there visible characters other than the protagonist? If yes, tag them.
  • Check Rule 10 (Reference Image Prefix): Is there a description of the physical environment of the company/workplace? If yes, add the prefix
    Reference image environment;
    .
Step 2: Submit to Queue
bash
cd "{W}/dream_to_video" && python -u main.py add "the complete Prompt you generated"
This command completes instantly and returns a task_id.
Step 3: Ensure Worker is Running Check if there is a running worker background task. If not, start one:
bash
cd "{W}/dream_to_video" && python -u main.py worker
Run in background mode (
run_in_background: true
). The Worker will automatically:
  1. Detect if the Prompt has the
    Reference image environment;
    prefix
  2. If yes → Automatically select reference images, switch to "All-round Reference" mode, upload images, and reference the images in the ProseMirror editor via
    @Image 1
    when inputting the prompt
  3. If no → Directly input the prompt in the textarea
  4. Configure generation settings (Seedance 2.0 / 16:9 / 15s) → Click generate → Monitor progress → Download → Post-processing effects
Step 4: Inform the User Tell the user that the task has been submitted, and the video will be automatically downloaded to
{W}/dream_to_video/output/
with a prompt tone notification after generation is completed.

Check Status

If the user asks about the progress, run:
bash
cd "{W}/dream_to_video" && python -u main.py status
Or read the status file:
{W}/dream_to_video/output/batch_state.json

2. Video Prompt Generation Rules

Top-Level Iron Rules

1. Hardcore Realism

Strictly prohibit any anime, two-dimensional vocabulary. Forbid using AI empty words such as "golden, aesthetic, epic, neon, cyberpunk". Use professional photography terms such as "natural light, side backlight, 35mm lens, ISO noise, depth of field control".

2. Uncanny Dream-Logic

Strictly prohibit turning people into monsters. Reflect the sense of dream through environmental atmosphere, logical jumps, and tiny but unreasonable details (such as moving mountains, automatically appearing pastries, repeated mechanical actions).

3. Strict Adherence to Original

Must include the core visual elements in the materials (such as specific items, specific scenes, specific character actions). Do not fabricate non-existent elements.

4. Cinematic Camera

No longer limited to first-person perspective. Can use panoramic, close-up, handheld follow or fixed camera angles. Emphasize the physical dynamics of the lens (such as lens shake, focus switching, slow push-pull) to maintain the sense of presence in the frame. Prioritize using Fisheye Lens (Ultra-Wide 12mm) to strengthen the spatial distortion and oppression of the dream through barrel distortion, especially suitable for close-up characters, corridors, indoor scenes, etc.

5. Silent Visuals

Strictly prohibit line descriptions. All communication is completed through eye contact, gestures, nodding, physical pointing or item display.

6. No Names for Protagonist

Unified use of "Protagonist". Except for global celebrities, replace personal names with "companion", "driver", "believer", etc.

7. Pure Frame

Strictly prohibit any text, subtitles, Logo or watermark in the frame. All information must be conveyed through pure visual elements, and cannot rely on superimposed text.

8. Space & Time (Logic & Timing)

Scene transitions require physical connection (such as walking into shadow, opening a door) or use "Hard Cut". Total duration is within 15s, 1-6 shots.

9. Character Appearance Tagging

Tag the appearance features of other visible characters except the protagonist in the frame. Do not tag the protagonist (dreams are from a first-person perspective, the protagonist usually appears in POV/hands/back view, and the face is not visible).
Region Detection: Scan the region/country keywords in the user's materials:
User Material KeywordsAdded to Other Characters in Prompt
USA, New York, Los Angeles, etc.American
Japan, Tokyo, Osaka, etc.Japanese
South Korea, Seoul, Busan, etc.Korean
India, Mumbai, Delhi, etc.Indian
UK, London, etc.British
Thailand, Bangkok, etc.Thai
Russia, Moscow, etc.Russian
Other recognizable countries/regionsCorresponding nationality
  • Default Value: When there are no region/country words in the materials, add "East Asian appearance" to other visible characters by default
  • Writing Method: Naturally integrate into the first appearance description of other characters, such as "a companion with East Asian appearance standing at the end of the corridor", "dozens of colleagues with East Asian appearance scattered in groups of three or five around the venue"
  • When There Are No Other Characters: If the whole dream only has the protagonist (such as working overtime alone, staying at home alone), do not add any appearance tags

10. Company Environment Reference Image Prefix

When the user's materials describe the physical environment of a specific company/workplace, add the prefix "Reference image environment;" at the very beginning of the Prompt. The Worker will automatically select the corresponding indoor/outdoor reference image from
{W}/dream_to_video/reference_images/
to upload.
Need to Add (semantics refer to place/environment):
User Material ExampleJudgmentReason
"In the corridor of the company"✅ Add"Company" refers to physical place
"A car is parked at the entrance of the company"✅ AddDescribes the physical entrance of the company
"Arrived at the elevator hall of Building X"✅ AddDescribes a specific building
"All lights in Studio X are on"✅ AddDescribes a specific studio/photography studio
"The lights in the corridor of the office building are green"✅ AddDescribes the office building environment
"The lobby of the office building is very empty"✅ AddDescribes the office building environment
"The factory workshop is full of dust"✅ AddDescribes the factory environment
"The company annual meeting is in a certain venue, and the ceiling is leaking"✅ AddVenue + ceiling = physical building environment
"Working overtime in the company, the lights in the office suddenly went out"✅ AddOffice = physical space
No Need to Add (semantics refer to people/social relationships/non-company scenes):
User Material ExampleJudgmentReason
"A colleague from the company asked me to borrow money"❌ Do not add"Company's" modifies people, not place
"Eating with people from the company"❌ Do not add"People from the company" refers to social relationships
"The company boss suddenly appeared"❌ Do not addRefers to character identity
"Met a friend from the company in the mall"❌ Do not addThe scene is a mall, not a company
"Participated in the company red envelope war"❌ Do not addSocial activity, no physical space description
Core Judgment: See whether "company/building/studio" acts as an adverbial of place (where) or an attribute modifying people (whose). Add in the former case, do not add in the latter case. When the same material has both company environment description and company-related people, judge based on whether there is a specific description of the company's internal scene — as long as there is a description of actual space such as corridor, office, elevator hall, workshop, venue + ceiling/wall, add the prefix.
Writing Format:
Reference image environment; This is a [realistic + emotional word] dream...

Output Format

The Prompt must be a single continuous text (no line breaks, no Markdown format, no分段). It should include the following parts:

Part 1: Style Opening Sentence

This is a [realistic + emotional word] dream, shot with [lens method].

Part 2: Visual Narrative (Shot 1-6)

ShotFunctionDescription Points
Shot 1Opening & Tone SettingEnvironment reveal + natural light and shadow + physical relationship between protagonist and environment
Shot 2Main Plot & DetailsCore event + surreal details + key action interaction
Shot 3Turning Point or Jump CutHard cut or physical connection + camera angle change + eerie feedback
Shot 4-6Climax & ExitVisual impact + frame edge distortion + physical dissipation or hard stop

Part 3: Environmental Sound Effects

Ambient background sound + key physical impact sound + distorted mechanical sound / physical echo of ambient sound

Part 4: Technical Style Base (Mandatory)

Shot on Arri Alexa, Fisheye Lens (Fisheye 12mm), obvious barrel distortion in the frame. Letterbox (2.39:1), mandatory wide-screen movie aspect ratio. Heavy Vignette, the four corners of the frame are darkened and converge towards the center. [Fill in light and shadow description according to the scene], low-saturation cool tone. Photo-level realism. Faint digital noise and VHS-like distortion flicker at the frame edges, the image feels fragile as if it will collapse at any moment. Dream core. Liminal space.

Output Example

This is a realistic, absurd, and terrifying dream, shot with alternating fisheye wide-angle and handheld follow. Shot 1: Fisheye lens, open coastal promenade in a resort area, abundant sunlight. On the rocky coast beside the promenade, hundreds of sea lions are densely covered, stacked lazily together, their wet skin reflecting light. Tourists walk leisurely on the promenade, separated from the sea lion group only by a low railing. The frame is unnaturally calm. Shot 2: Hard cut. Fixed camera angle, ultra-wide angle. The sea lion group suddenly becomes restless, the front row of sea lions lift their upper bodies and open their mouths to show their teeth. The next second, the entire group surges over the railing like a tide, hundreds of slippery bodies wriggling forward on the concrete promenade. Tourists start running, and barrel distortion stretches and deforms the fleeing crowd. Shot 3: Handheld follow, severe shaking. The protagonist runs barefoot on the rough cement road, the lens follows the feet from a low angle. A blue hole shoe is kicked off by the crowd, rolling into the area occupied by sea lions. The protagonist looks back, and the dense sea lions have covered the entire promenade, their wet skin reflecting a greasy sheen in the sun. Shot 4: Fisheye lens. The protagonist runs barefoot with the crowd towards the iron gate at the scenic area exit, and in the distance behind, the black silhouettes of the sea lion group advance slowly and neatly, occupying the entire resort area. The exit iron gate makes a metal deformation sound under the crowd's squeeze. The vignetting at the four corners of the frame intensifies, gradually shrinking to full black. Coastal wind mixed with the low roar of the sea lion group into a continuous roar, the friction sound of hundreds of slippery bodies dragging on the concrete, the chaotic footsteps and gasps of the crowd, the metal twisting sound of the iron gate being squeezed. Shot on Arri Alexa, Fisheye Lens (Fisheye 12mm), obvious barrel distortion in the frame. Letterbox (2.39:1), mandatory wide-screen movie aspect ratio. Heavy Vignette, the four corners of the frame are darkened and converge towards the center. High contrast under strong coastal sunlight, low-saturation cool tone. Photo-level realism. Faint digital noise and VHS-like distortion flicker at the frame edges, the image feels fragile as if it will collapse at any moment. Dream core. Liminal space.

3. Login Instructions

When the Worker starts, it will automatically detect the login status of the Jiemeng Platform:
  • First use / Login expired: The browser will automatically open the Jiemeng website and display a login QR code, and play a prompt tone (three long beeps + two short beeps) to remind the user. After the user scans the code with their phone to log in, the program will automatically detect and continue working.
  • Already logged in: Start working directly without any operation.
  • Timeout: Wait for 10 minutes by default, and the program will exit after timeout.
You can also log in manually in advance:
bash
cd "{W}/dream_to_video" && python main.py login

4. Post-Processing Effects (Elliptic Shatter)

After downloading each video, the Worker will automatically execute the Elliptic Shatter Edge Effect post-processing, and finally output two files:
FileNaming FormatDescription
Original
task_XXX_YYYYMMDD_HHMMSS.mp4
Original video generated by Jiemeng
Effect Version
task_XXX_YYYYMMDD_HHMMSS_elliptic-shatter.mp4
Overlaid with elliptic shatter edge effect
Effect Description:
  • The center of the video remains clear original frame
  • The edge presents broken glass texture (debris scattering, slight rotation, chromatic aberration refraction)
  • Outer dark border + faint debris texture
  • Overall effect: Black texture border + elliptic viewing window + rotating shattered particles
The effect script is located at
effects/elliptic_shatter.py
, processed with OpenCV + NumPy, and runs in a subprocess without blocking the main process.

5. Key Notes

  1. The Prompt is a single piece of plain text, no line breaks, Markdown format or分段. Pass it directly as a string to
    python main.py add
    .
  2. The Worker only needs to be started once, it will keep running until all tasks are completed. Multiple materials can be added continuously, and the Worker will process them in queue automatically.
  3. Each task outputs two videos to
    {W}/dream_to_video/output/
    : original version + elliptic-shatter effect version.
  4. There is a prompt tone (3 short beeps) after download + effect processing is completed, and the user can go to get the video when they hear the sound.
  5. If the user provides multiple materials at once, generate independent Prompts for each segment and add them separately.
  6. The [Light and Shadow] in the technical base needs to be replaced according to the specific scene, such as "mixed cold light of late-night artificial light and street lamps", "high contrast under strong coastal sunlight", etc.