Slideshow Producer
Follow shared public skill rules in:
Pipeline Position
script-generator (slideshow mode)
→ slideshow-producer [THIS SKILL]
→ image-batch-runner
- script-generator owns: hook logic, script flows, angle strategy, persona checks
- This skill owns: vibe-to-prompt translation, slide manifest management, localhost review, batch orchestration, text compositing
- image-batch-runner owns: actual image generation calls, downloaded assets, local manifests
What This Skill Is For
- turning a user's vibe description into a concrete slide-by-slide plan
- writing concise, effective image prompts (core scene + vibe, not over-constrained)
- producing a local slide manifest JSON for review and iteration
- optionally launching a localhost GUI for drag-and-drop slide editing
- orchestrating batch image generation through image-batch-runner
- compositing TikTok-style overlay text (white with black stroke) onto final slides
What This Skill Is Not For
- writing hook variants or script flows from scratch (that is script-generator's job)
- calling image generation APIs directly (that is image-batch-runner's job)
- creating persona packs or brand voice documents
- replacing creative-qa for final human quality review
Model Defaults
| Param | Default | Notes |
|---|
| text model | | Use only when the slide has no reference image |
| edit model | | Required when the slide has any reference image |
| quality | | Faster and cheaper; sufficient for UGC slideshows |
| resolution | | Good for 1080×1920 output; use 2k for fine detail |
| aspectRatio | (TT) or (IG) | Based on platform |
| outputFormat | | GPT Image 2 always outputs PNG |
Local Dependencies
- with Pillow () is required for
scripts/composite-text.mjs
.
Reference Image Routing Rule
The slide manifest is the routing source of truth.
- means use as the final slide image. Do not call image generation.
- with no or means
generationMode: "text-to-image"
and must call with .
- with any or means and must call with .
- Local reference files must be uploaded through image-batch-runner's first. Pass only the returned URLs into as .
Never send a slide with reference images through text-to-image. If the manifest has references but
is missing or set to
, fix the manifest before generation.
Prompt Writing Principles
GPT Image 2 needs clear scene and vibe description, not a wall of constraints.
Do: core scene + vibe
Describe what the image should show and how it should feel. 2-3 sentences max.
Good:
A person sitting at a cluttered home desk, frustrated expression, looking at a laptop screen.
Natural window light from the side, casual iphone photo feel, slightly warm color cast.
Don't: over-constrain
Do not pile on "no studio lighting, no professional photography, no cinematic quality, no soft glow, no perfect skin, no..." — GPT Image 2 doesn't need negative constraints. They clog the prompt and can confuse the model.
If the vibe is "casual iphone photo", just say that. The model understands.
When to add specifics
- Product visible: describe it clearly (color, shape, placement)
- Environment matters: room type, lighting source, time of day
- Human expression: the specific emotion or action
- Props: what else is in frame
When in doubt, be more specific about what IS there, not what ISN'T.
Platform Modes
TikTok Slideshow (default)
- Aspect ratio: 9:16 (1080×1920)
- Safe zones: avoid top ~60px and bottom ~80px for UI overlays
- Overlay text: upper-center, bold white with black stroke
- Default slide count: 5-7
Instagram Carousel
- Aspect ratio: 4:5 (1080×1350) preferred; 1:1 (1080×1080) alternative
- Most text in the caption field; minimal image overlay
- Default slide count: 4-6
Default to TikTok when the user hasn't specified.
Read detailed specs in
references/platform-specs.md
.
Default Workflow
Phase 1: Requirements
Ask the user:
- How many slideshows? → determines sub-agent count
- TikTok or Instagram? → default TT
- Describe the vibe — what should this feel like? Keep it loose.
- Any local images to use for specific slides? → identify whether each path is a final slide image () or a reference for generation ().
Do not jump to scripts until you understand the feeling they want.
Phase 2: Script Creation
For each slideshow, produce a slide manifest JSON.
Each slide has:
- (1-indexed)
- (hook, problem, insight, proof, product, cta, etc.)
- (concise, following prompt writing principles above)
- (3-8 words, lowercase, conversational — or null if no text)
- ( or )
- (only if imageSource is "local")
- (, , or null)
- (local reference files for generated edit-mode slides)
- (uploaded reference URLs for generated edit-mode slides)
Save as a local manifest file.
Read the full schema in
references/slide-manifest-schema.md
.
After producing the manifest, ask: "Need a localhost preview to review and edit? (y/n)"
If yes → start the GUI server, user reviews, drags to reorder, edits prompts and overlay text.
If no → show the JSON summary, user confirms by text.
Before asking for generation approval, run:
bash
node ${CLAUDE_SKILL_DIR}/scripts/validate-manifest.mjs --manifest /path/to/slideshow-manifest.json
Phase 3: Generation Trigger
After user approves the manifest, explicitly ask:
"Ready to generate N slides? GPT Image 2, quality: medium, resolution: 1k. X text-to-image, Y reference edits."
Do not auto-generate. User must confirm.
Phase 4: Batch Generation
Local slides: copy the image into the output directory as-is.
Generated slides without references: call image-batch-runner's
per slide.
Generated slides with references: upload each local reference path with
, then call
with the uploaded URLs. Existing
can be used directly if they are already HTTP(S) URLs.
Save the normalized request JSON per slide so the run is reproducible.
Default text-to-image request shape per slide:
json
{
"assetId": "{slideshowId}-slide-{N}",
"runId": "{slideshowId}-run-{timestamp}",
"provider": "hosted-media",
"model": "image-gpt-image-2-text",
"mode": "text-to-image",
"prompt": "{the slide prompt}",
"aspectRatio": "9:16",
"quality": "medium",
"resolution": "1k",
"outputFormat": "png",
"localAssetDir": "{output directory}"
}
Default reference edit request shape per slide:
json
{
"assetId": "{slideshowId}-slide-{N}",
"runId": "{slideshowId}-run-{timestamp}",
"provider": "hosted-media",
"model": "image-gpt-image-2-edit",
"mode": "edit",
"prompt": "{the slide prompt, explicitly preserving the reference image facts}",
"inputUrls": ["{uploaded reference image URL}"],
"aspectRatio": "9:16",
"quality": "medium",
"resolution": "1k",
"outputFormat": "png",
"localAssetDir": "{output directory}"
}
Phase 5: Text Overlay + Final Review
After all images are generated, run
to add overlay text to each slide.
Overlay style (TikTok default):
- White text (#FFFFFF), black stroke (#000000, 3px)
- Font: bold sans-serif (Helvetica Bold or system equivalent)
- Position: upper-center
- Size: ~56px relative to 1080px canvas width
Show the final results to the user.
Sub-Agent Orchestration
When N slideshows are requested:
- N ≤ 2: the main agent handles them sequentially
- N ≥ 3: spawn N sub-agents in parallel for Phase 2 script creation
Each sub-agent produces one slide manifest JSON. The main agent collects and presents them together.
GUI (Optional)
A minimal localhost review tool at
.
Start with
(default port 3099).
The GUI lets the user:
- See all slides in order
- Drag to reorder
- Edit prompt and overlay text inline
- Toggle image source (generate vs local)
- Save changes back to the manifest JSON
The GUI is a preview tool, not a render trigger. Generation is separate (Phase 3).
Failure Modes
- No vibe description → ask again with more specific prompts
- Single slide → minimum is 2 for a slideshow
- Local image not found → flag before generation, not during
- User wants to skip review → allow but warn
- image-batch-runner unavailable → stop, report the gap
- Python Pillow unavailable → stop before compositing and install Pillow in the active environment
Shared Source Context
When campaign-level source files exist, use them as context:
- for tone and forbidden phrases
- for visual character constraints (only when slides feature the persona)
- for accurate product appearance
This skill may read these files. It does not create or manage them.
Handoff
After Phase 5, suggest next steps:
- for human quality review
- for publishing