World-Class Instagram Carousel Generator
Generate Instagram carousels that are genuinely world-class: content people save, share, and come back to. Not engagement bait. Not AI slop. Actual value, delivered through precise visual design and narrative structure.
This skill is fully generalized. It contains FORM (structure, principles, patterns), not MATTER (specific topics). The user provides the matter (topic); the skill provides the form (archetypes, design system, music matrix, quality gates). Together they produce the carousel. Nothing is hardcoded.
BEFORE YOU START: Read
Before generating ANY carousel, read
/home/node/.claude/skills/world-class-carousel/KNOWN_ISSUES.md
. It contains compressed rules from all previous sessions -- data format gotchas, sizing rules, visual strategy decisions, and quality gates. Ignoring it means repeating solved mistakes.
EXECUTION PIPELINE
When the user requests a carousel, execute these 6 phases in order (Phase 6 runs post-delivery):
PHASE 1: RESEARCH & STRUCTURING
- Analyze the topic -- What is the core insight? What specific value can this deliver?
- Identify the audience -- What does the target audience NOT already know? What's their current understanding?
- Auto-detect content vertical and theme -- Use the Content Vertical Detection table below
- Select the archetype -- Which of the 7 carousel archetypes (see below) fits best? Use the Archetype Selection Guide below. Auto-select unless the user specifies.
- Design the narrative arc -- Map each archetype role to a renderer slide type using the Role-to-SlideType Mapping below. Ensure each slide creates a curiosity gap that the next slide resolves.
- Run the Bullshit Test on the outline -- Does every slide pass? (See QUALITY GATE below)
Content Vertical Detection (Topic -> Theme)
Analyze the topic and auto-select the renderer theme:
| Content Vertical | Keywords/Signals | Renderer Theme | Background Style |
|---|
| Tech / AI / Coding | AI, code, developer, API, tools, stack, programming, SaaS, data | | (default) |
| Business / Strategy | growth, revenue, startup, founder, marketing, sales, strategy, scale | | |
| Education / How-To | learn, tutorial, guide, roadmap, beginner, master, course, how to | | |
| Creative / Design | design, UX, brand, visual, aesthetic, portfolio, creative | | |
| Mindset / Philosophy | mindset, habits, productivity, stoic, growth, mental, philosophy | | |
If the user specifies a brand config with a theme, always use that instead.
Content Category Selection (10 Categories, Aristotelian)
Each category has unique visual DNA derived from psychology axioms (Cialdini, cognitive load theory, dual coding, serial position effect). Select based on topic:
| If the topic is about... | Category | Arc Shape | Hook Style | Primary Cialdini |
|---|
| Explaining a research paper | | Revelatory | Face + paper panel | Authority |
| Comparing AI tools/models | | Divergent | Multi-screenshot face-off | Social Proof |
| Today's AI development | | Convergent | News-editorial face | Scarcity |
| Step-by-step AI tool how-to | | Linear | Phone-in-hand / device mockup | Reciprocity |
| Controversial opinion | | Confrontational | Bold abstract typography | Authority |
| Copy-paste prompts/templates | | Divergent | Phone screenshot mockup | Reciprocity |
| Complete sector overview | | Divergent | Multi-person face-off | Authority |
| Build [X] with AI project | | Linear+Reveal | Multi-device result showcase | Social Proof |
| Funding/business news | | Convergent | Founder portrait + data | Scarcity |
| Future predictions/timeline | | Revelatory | Abstract cinematic AI imagery | Scarcity |
Universal Psychology Rules (apply to ALL categories):
- Max 4 information chunks per slide (Cognitive Load Theory, Sweller)
- Pattern interrupt every 2-3 slides (diagram, comparison, color shift, or layout change)
- Density wave: H-M-H-M-H-H-M (never 3 high-density slides consecutively)
- Synthesis slide = THE save trigger (Serial Position Effect: last items remembered best)
- Dual-code the hardest concept (Paivio: visual + text = 6.5x retention)
- CTA matches save trigger: utility categories → "Save this", social categories → "Share/Comment"
Category-to-Slide-Sequence Quick Reference:
- (9 slides): hook → body → diagram → body → body → diagram → body → synthesis → cta
- (8 slides): hook → body → comparison → body → body → comparison → synthesis → cta
- (8 slides): hook → body → body → body → diagram → body → synthesis → cta
- (8 slides): hook → body → tool → tool → tool → body → synthesis → cta
- (7 slides): hook → body → body → body → body → synthesis → cta (text-driven, no diagrams)
- (9 slides): hook → body → body → body → comparison → body → body → synthesis → cta
- (9 slides): hook → diagram → body → body → body → comparison → diagram → synthesis → cta
- (8 slides): hook → body → tool → tool → tool → body → synthesis → cta
- (7 slides): hook → body → body → body → diagram → synthesis → cta
- (8 slides): hook → body → body → diagram → body → body → synthesis → cta
Role-to-SlideType Mapping
Map each archetype role to a renderer slide type when building the carousel spec:
| Archetype Role | Renderer Slide Type | Notes |
|---|
| | Use + for split title effect |
| , , , | | Use for the key phrase |
| , , , , , | | Use for key points |
| | Use for the item name, for details |
| , | | Use with or layout |
| , | | Use with opposing views |
| , , , | | Use title highlight to emphasize the key outcome |
| | Use for numbered key takeaways |
| | Use , , optional |
| , , | | Use bullets for listed points |
PHASE 1.5: VISUAL STRATEGY DECISION (Before Writing Content)
Before writing any content, decide the visual strategy for this carousel. You have access to multiple tools -- choose the right ones for the topic.
Available Visual Tools Inventory
| Tool | What It Does | When to Use | How to Invoke |
|---|
| AI Cinematic Images | HD photorealistic/artistic images (Gemini 3 Pro) | Hook/CTA backgrounds, emotional priming, conceptual anchoring | skill with hyper-detailed prompt (50+ words) |
| AI Flowcharts/Diagrams | Production-quality flowcharts with text labels, arrows, boxes | Process flows, pipelines, decision trees -- REPLACES TikZ for better visuals | skill with structural prompt describing boxes + connections |
| AI Architecture Diagrams | Blueprint-style system diagrams with components and connections | Microservices, tech stacks, system design | skill with component/connection prompt |
| AI Infographics/Charts | Bar charts, data visualizations with accurate labels and proportions | Market data, statistics, comparisons | skill with data + style description |
| AI Abstract Backgrounds | Neural networks, geometric patterns, cosmic visuals | Slide backgrounds via | skill with atmosphere/material prompt |
| TikZ Diagrams | Vector flowcharts in LaTeX (basic but reliable) | Simple 3-5 node flows where AI image gen is overkill | Use slide type with |
| Gradient Backgrounds | TikZ-rendered gradient fills with geometric accents | Default for all text-only slides | Set in slide data |
CRITICAL: Slide-Type Visual Rule (Experimentally Verified)
This rule was established through controlled A/B experiments (7 strategies, same content, scored 1-10). It overrides gut instinct:
| Slide Type | Visual Strategy | WHY (Experimental Evidence) |
|---|
| Hook | full-bleed + 0.60-0.68 overlay | Scroll-stopping power. First slide = 80% of engagement. Score: 8.0/10 |
| Body | TEXT-ONLY. No images. | Images on body slides destroy 40% of content space. Text-only scored 8.3/10 vs 5.7/10 with images |
| Diagram | AI-generated diagram as (preferred) OR TikZ fallback | Gemini 3 Pro generates production-quality flowcharts with readable labels, arrows, and boxes. Far more visually striking than basic TikZ. Use + 0.55-0.65 overlay so text remains readable over the diagram. |
| Synthesis | Text-only | Save-worthy reference material. Images would reduce information density. |
| CTA | full-bleed + 0.65-0.70 overlay | Emotional close with visual punch. |
DO NOT put AI images on body slides. This was the single biggest quality mistake found in testing.
DO NOT use browser screenshots on any slides. They always look terrible embedded in carousel slides.
Visual Strategy Decision Matrix (Topic-Level)
For each topic, determine the primary visual mode, background style, and which slide-level visuals to use:
| Topic Type | Background Style | Hook Visual | Body Visuals | Diagram Strategy | Example |
|---|
| Philosophy / Mindset | | AI image: symbolic figure | None (text carries weight) | AI-generated concept map | Stoic principles: marble bust + storm |
| Tool Review / SaaS | or | AI image: abstract tech glow | None (text-only bullets describe tools) | AI-generated comparison chart | "6 AI Tools": text descriptions + AI chart |
| News / Current Events | | AI image: dramatic scene | None (text with citations) | AI-generated timeline or power map | "AI War 2025": cinematic + AI power map |
| Technical Tutorial | (clean) | AI image: conceptual diagram | None (step-by-step text) | AI-generated architecture/flowchart | "Deploy with Docker": AI architecture diagram |
| Business / Strategy | | AI image: bold abstract | None (text with real data citations) | AI-generated bar chart or funnel | "Growth Hacking": AI infographic |
| Comparison / Versus | | AI image: abstract contrast | slide type columns | AI-generated side-by-side chart | "React vs Vue": comparison columns + AI chart |
| Creative / Design | (dark) | AI image: artistic/gallery quality | None (text-only) | AI-generated process flow | "UX Trends 2025": artistic + AI flow |
| Framework / Mental Model | | AI image: system metaphor | None (text explains components) | AI-generated flowchart (preferred over TikZ) | "OODA Loop": AI flowchart as |
| Data / Research | | AI image: data visualization concept | None (text with specific numbers) | AI-generated bar chart / infographic | "AI Market 2025": AI bar chart |
AI Image Generation Best Practices
Model & Routing:
- Use skill (uses ). Nano-banana-pro requires (often unset) but uses the same underlying model.
- Model:
google/gemini-3-pro-image-preview
(primary). Fallback: google/gemini-3.1-flash-image-preview
.
- Output: ~1408x768 landscape. Overlay compensates for portrait stretch on slides.
Gemini 3 Pro Proven Capabilities (Experimentally Verified):
| Capability | Quality | Best Use in Carousels | Prompt Strategy |
|---|
| Cinematic portraits | Excellent | Hook/CTA backgrounds | 50+ words: materials, lighting, composition, colors, atmosphere |
| Multi-image composition | Excellent (avg 9.6/10) | Hook slides with real faces + screenshots | Aristotelian axioms below. Send base64 to /api/v1/images/generations
|
| Screenshot → device mockup | Excellent | Tool showcase, product launch slides | "floating laptop/phone mockup, dark studio, reflective surface" |
| Person + screenshot editorial | Excellent | News hooks with evidence | "person as SUBJECT, screenshot as floating holographic EVIDENCE panel" |
| Multi-screenshot dashboard | Excellent | Comparison/versus slides | "floating panels at varied depths, color-coded edge glows, grid floor" |
| Flowcharts | Excellent | Diagram slides as | Describe boxes, arrows, labels, and connections structurally |
| Abstract backgrounds | Excellent | Any slide background | Materials, colors, atmosphere, "no text no words" |
The 7 Aristotelian Axioms for Multi-Image Composition (Experimentally Proven)
These irreducible premises govern ALL multi-image prompts. Every prompt must satisfy all 7:
A1: VISUAL HIERARCHY -- Eye processes: faces > contrast edges > text > color fields. Composition must respect this order.
A2: INPUT TYPE DETERMINES ROLE -- Each input has exactly one role:
- Photo of person → SUBJECT (preserve face, never modify)
- Screenshot/UI → EVIDENCE (float as holographic panel, stylize frame, preserve content)
- Logo/brand → ANCHOR (small, consistent corner placement)
- Abstract/texture → ATMOSPHERE (background only)
A3: UNIFIED LIGHT SOURCE -- All elements share one dominant light direction. Mixed lighting = instant "fake" detection.
A4: DEPTH CREATES DRAMA -- Foreground sharp (subject), midground recessed (screenshots), background soft (atmosphere). 3 layers minimum.
A5: NEGATIVE SPACE IS FUNCTIONAL -- Bottom 30-35% dark for text overlay. Not waste -- it's where the headline goes.
A6: COLOR TEMPERATURE = STORY -- Cool blue/teal = innovation. Warm red = urgency. Split red/blue = competition. Mono + accent = editorial.
A7: NO-TEXT SEAL -- Always end with "absolutely no text, no words, no letters, no watermarks" (outside screenshots).
Proven Scenario Prompt Templates (avg 9.6/10 across 10 tests)
Person + News Screenshot (9.5/10): "Image 1 is [person] -- preserve face, place in left 60%, dramatic side lighting. Image 2 is screenshot -- float as glowing translucent panel, tilted 8 degrees, recessed behind subject, cyan edge glow. Dark moody background, cinematic depth of field. Bottom 30% dark. No text outside screenshot."
Tool Screenshot Showcase (9/10): "Place screenshot on sleek floating laptop mockup angled 15 degrees. Dark gradient background, ambient teal glow from screen. Glossy reflective surface below. Premium Apple product launch aesthetic. No text outside screenshot."
Multi-Screenshot Dashboard (9.5/10): "Arrange as glowing panels floating in dark space, varied depths and angles (5-15 degrees). Largest centered. Color-coded edge glows. Grid floor, particle effects. Digital command center aesthetic. No text outside screenshots."
Person + Screenshots + Logo (10/10): "Person as dominant subject center-left, face preserved. Screenshots as holographic panels around them. Logo small in upper corner with glow. Volumetric light rays, 3-layer depth. No text outside screenshots/logo."
Face-Off + Data (10/10): "Person A on LEFT in profile facing right, red lighting. Person B on RIGHT facing left, blue lighting. Dashboard between them as floating holographic display. Smoke and sparks in the gap. Competitive energy. No text outside screenshot."
Phone in Hand (10/10): "Screenshot on smartphone held in hand from lower-right. Dark background, soft bokeh lights. Screen bright and crisp. Lifestyle photography style. No text outside screenshot."
5-Image Mega (10/10): "2 people (main foreground, secondary recessed) + 2 screenshots (holographic panels, color-coded glows) + logo (corner). Volumetric light, split lighting, multiple depth layers. No text outside screenshots/logo."
Prompt Rules:
- HYPER-DETAILED (50+ words): materials, lighting, composition, colors, atmosphere. Generic = bad.
- Always end with "absolutely no text, no words, no letters, no watermarks" -- AI models add unwanted text otherwise.
- Declare each input's role explicitly (per A2): "Image 1 is a portrait... Image 2 is a screenshot..."
- Specify depth map (per A4): "subject sharp foreground, screenshots floating midground, atmospheric background"
- Lock light direction (per A3): "single dominant light from upper-left, rim light on subject"
- For editorial portraits: add "ABSOLUTELY NO TEXT NO LOGOS NO MAGAZINE ELEMENTS" or Gemini creates TIME covers.
- Overlay opacity sweet spot: 0.60-0.68 for hooks, 0.55-0.65 for diagrams, 0.65-0.70 for CTA.
Background Style Selection
Set
in the carousel spec or per-slide
to control the look:
| Value | Visual Result | Best For |
|---|
| (default) | Top-to-bottom gradient with subtle accent glow | All themes. Clean, modern, professional |
| AI-generated paper/fabric texture | AVOID -- produces grey rock look |
| Multi-stop gradient with geometric accent shapes | Creative, premium, high-contrast |
| Flat theme background color | Clean/education themes, data-heavy content |
| (AI background) | Full-bleed AI image with overlay | Dramatic hooks, artistic carousels |
Set it at the spec level for all slides:
in the spec JSON.
Or per-slide for variation:
"data": {"bg_style": "gradient_mesh", ...}
on specific slides.
Screenshot Capture Protocol -- DEPRECATED
DO NOT use browser screenshots on carousel slides. They consistently look terrible -- low resolution, poorly framed, and badly integrated with the slide design. This was tested extensively and abandoned.
Instead: Use AI-generated images via Gemini 3 Pro for any visual needs:
- Tool/product visuals: Generate an AI illustration or abstract representation
- Data/charts: Generate AI bar charts or infographics (Gemini 3 Pro handles these well)
- Architecture/flows: Generate AI flowcharts or architecture diagrams
- People: Use text descriptions instead of photos
PHASE 2: CONTENT CREATION
- Write the hook (Slide 1) -- Apply the Hook Taxonomy. This slide determines everything.
- Write each slide -- One idea per slide. No exceptions. Apply the Bullshit Test to each.
- Map to renderer data format -- For each slide, create the JSON data object matching the slide type's required fields (see Data fields by slide type in RENDERING SCRIPTS).
- Execute the visual strategy decided in Phase 1.5:
- Generate AI images per the 2-3 Rule (hook + 1-2 emotional peaks). State each image's telos.
- Capture browser screenshots for any real tools/products/news referenced.
- Set per the Background Style Selection table.
- Use slide type for any process/flow that benefits from a visual.
- Select Instagram music -- Apply the Music Decision Matrix (see MUSIC SELECTION).
- Write the caption -- Front-load value in first 2 lines. Include CTA and hashtags.
- Build the carousel spec JSON -- Assemble all slides into a single spec file for the orchestrator.
PHASE 3: VISUAL PRODUCTION (LaTeX Pipeline)
Use the LaTeX-based rendering pipeline for publication-grade output. This produces slides that match or exceed the quality of accounts with 1M+ followers (Chase AI, Analytics Vidhya, etc.).
The pipeline: LaTeX (TikZ) -> PDF (pdflatex) -> PNG (pdftoppm at 300 DPI) -> resize to 1080x1350
Why LaTeX (not Pillow/HTML)
- Knuth-Plass optimal line breaking -- no ugly word wraps
- Professional font kerning and ligatures -- Palatino with microtype
- Native vector diagrams -- TikZ flow charts (fallback for simple diagrams)
- AI image integration -- full-bleed Gemini 3 Pro images for hook/CTA/diagram backgrounds
- Gradient backgrounds -- clean TikZ-rendered gradients for text-only slides
- Publication-grade output -- the same engine that typesets academic papers and books
Step 3a: Generate AI Images (Hook, CTA, Diagrams)
Generate AI images for hook background, CTA background, and optionally diagram backgrounds:
bash
# Hook background (cinematic, hyper-detailed 50+ word prompt)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Dramatic cinematic split-screen composition: left side dark blue crystalline monolith with electric energy, right side warm golden organic neural network, clash of opposing forces, volumetric lighting, no text no words no letters" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
# CTA background (emotional close)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Mesmerizing cosmic portal with swirling deep indigo and purple energy, golden light rays, ethereal atmosphere, no text no words" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
# Diagram as AI image (optional -- replaces TikZ for better visuals)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Professional flowchart: Data Collection box connects to Processing box connects to Output box, clean white background, blue and grey, sharp vector style, readable labels" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
Step 3b: Render Slides with 7 Slide Types
The LaTeX renderer (
) supports 7 slide types:
| Type | Description | Best For |
|---|
| Large title + highlighted phrase + subtitle | Cover / first slide |
| Title + highlighted text + body + bullets | Content-heavy slides, curated list items |
| Multi-column comparison table | Side-by-side analysis |
| Title + TikZ flow diagram (vertical/horizontal) | Architecture, workflows |
| Styled numbered points with badges | Save-worthy summary |
| Centered title + text + handle button | Call to action |
4 Color Themes:
(parchment/terracotta),
(white/blue),
(indigo/purple),
(sage/gold)
Slide 1 (Hook) -- Title with AI background:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type hook \
--data tmp/carousel/hook_data.json \
--output tmp/carousel/slide_01.png \
--theme dark --brand tmp/carousel/brand.json
Where
contains:
{"title": "6 AI Tools That Will", "title_highlight": "Replace Your Stack", "subtitle": "The tools 10x engineers are switching to.", "callout": "Save this!", "slide_num": 1, "total_slides": 8, "ai_bg": "tmp/carousel/hook_bg.png", "overlay_opacity": 0.63}
Body slides -- Content-heavy with bullets (gradient bg, NO texture):
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type body \
--data tmp/carousel/body_data.json \
--output tmp/carousel/slide_02.png \
--theme dark --brand tmp/carousel/brand.json
Where
contains:
{"title": "Why Most Developers", "title_highlight": "Get This Wrong", "body": "The biggest mistake is...", "bullets": ["Point 1", "Point 2"], "slide_num": 2, "total_slides": 8, "bg_style": "gradient"}
NOTE: Always pass data as a JSON file path, never inline JSON. Always include
for text-only slides. Always pass
.
Comparison slide -- Multi-column:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type comparison \
--data tmp/carousel/comparison_data.json \
--output tmp/carousel/slide_04.png \
--theme warm --brand tmp/carousel/brand.json
Where
contains:
{"title": "Claude vs GPT", "subtitle": "How they compare", "columns": [{"name": "Claude", "items": [{"label": "Best for", "value": "Complex refactors"}]}, {"name": "GPT-4", "items": [{"label": "Best for", "value": "Quick prototyping"}]}], "slide_num": 4, "total_slides": 9, "bg_style": "gradient"}
Diagram slide -- AI-generated diagram background (preferred) or TikZ fallback:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type diagram \
--data tmp/carousel/diagram_data.json \
--output tmp/carousel/slide_07.png \
--theme dark --brand tmp/carousel/brand.json
Where
contains:
{"title": "The Architecture", "description": "How the tools connect.", "diagram_nodes": [{"label": "Code", "desc": "Write"}, {"label": "Deploy", "desc": "Ship"}, {"label": "Monitor", "desc": "Track"}], "diagram_type": "vertical", "slide_num": 7, "total_slides": 9, "ai_bg": "tmp/carousel/diagram_bg.png", "overlay_opacity": 0.60, "bg_style": "gradient"}
Synthesis slide -- Save-worthy numbered summary:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type synthesis \
--data tmp/carousel/synthesis_data.json \
--output tmp/carousel/slide_08.png \
--theme dark --brand tmp/carousel/brand.json
Where
contains:
{"title": "Your Stack", "points": ["Tool 1 for X", "Tool 2 for Y", "Tool 3 for Z"], "slide_num": 8, "total_slides": 9, "bg_style": "gradient"}
CTA slide -- with AI background for emotional close:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type cta \
--data tmp/carousel/cta_data.json \
--output tmp/carousel/slide_09.png \
--theme dark --brand tmp/carousel/brand.json
Where
contains:
{"title": "Want the full breakdown?", "cta_text": "Follow for daily tips.", "handle": "@yourbrand", "slide_num": 9, "total_slides": 9, "show_nav": false, "ai_bg": "tmp/carousel/cta_bg.png", "overlay_opacity": 0.67}
Step 3d: Full Carousel Generation (Orchestrator)
Generate a complete carousel from a single JSON spec:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
--spec carousel_spec.json \
--output-dir outputs/carousel/ \
--brand tmp/carousel/brand.json
The spec JSON format:
json
{
"topic": "6 AI Tools That Will Replace Your Stack",
"brand": "AI Builder",
"theme": "dark",
"bg_style": "gradient",
"slides": [
{"type": "hook", "data": {"title": "...", "title_highlight": "...", "ai_bg": "tmp/hook_bg.png", "overlay_opacity": 0.63}},
{"type": "body", "data": {"title": "...", "bullets": ["..."], "bg_style": "gradient"}},
{"type": "diagram", "data": {"title": "...", "diagram_nodes": [{"label": "...", "desc": "..."}], "diagram_type": "vertical", "ai_bg": "tmp/diagram_bg.png", "overlay_opacity": 0.60}},
{"type": "synthesis", "data": {"title": "...", "points": ["..."], "bg_style": "gradient"}},
{"type": "cta", "data": {"title": "...", "handle": "@brand", "ai_bg": "tmp/cta_bg.png", "overlay_opacity": 0.67}}
]
}
Spec-level applies to all slides. Per-slide
overrides it. Options:
,
,
. If omitted, defaults to
.
Never use
.
The orchestrator auto-injects brand name, slide numbering, renders all slides, and creates a preview grid.
Brand Configuration System
The design system is fully generalized through brand configs -- JSON files that define visual identity per channel or brand. Pass
to any render command.
Brand config JSON format:
json
{
"name": "TechStack AI", // Brand name shown in header
"logo": "path/to/logo.png", // Optional: logo image replaces text in header
"theme": "dark", // Base theme: warm, clean, dark, earth
"accent_override": "6366F1", // Optional: override accent hex (no #)
"font_serif": "newpxtext", // LaTeX serif font package (default: Palatino)
"header_style": "bold", // Header text: italic, bold, or plain
"nav_style": "circle", // Navigation arrow: circle, arrow, none
"divider_style": "line", // Dividers: line, ornament (diamond), dots, none
"corner_radius": "6pt" // Rounded corner radius for labels/badges
}
3 sample brand configs (in
):
| Brand | Theme | Accent | Header | Divider | Character |
|---|
| TechStack AI | dark | Indigo | Bold | Line | Modern dev/AI content |
| Growth Academy | earth | Amber | Italic | Ornament | Business coaching |
| Code Academy | clean | Blue (default) | Bold | Dots | Educational tutorials |
Usage with brand config:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type hook \
--data hook_data.json \
--output slide.png \
--theme dark \
--brand brands/techstartup.json
AI Image Integration (Aristotelian Framework): Slides support two AI image zones:
- : Accent illustration placed in a card (hook bottom, body bottom)
- : Full-bleed background with semi-transparent overlay for text readability
- When no AI image is provided, decorative geometric accents fill empty space automatically
AI Image Integration Principles (First-Principles Framework)
Images in carousels must serve a purpose (telos). Before generating any AI image, name its function in one sentence. If you cannot, do not generate it.
The Three Teloi (Purposes) of Carousel Images:
| Telos | When to Use | Image Form | Example |
|---|
| Emotional Priming | Create a feeling before text is read | Atmospheric, evocative, human/natural | Marble bust for philosophy, neon cityscape for tech |
| Conceptual Anchoring | Give abstract ideas a visual handle | Symbolic, metaphorical, illustrative | Storm figure for "amor fati", network diagram for systems |
| Authority Signaling | Establish credibility through proof | Documentary, screenshots, concrete | Product screenshot, data chart, real photo |
The 2-3 Rule (Golden Mean): In an 8-10 slide carousel, use AI images on exactly
2-3 slides. Always the hook (slide 1) and CTA (last slide). Optionally the diagram slide with an AI-generated diagram as
. Never on body slides -- visual fatigue destroys reading rhythm and costs 40% content space.
Image Placement Decision Matrix:
| Slide Type | AI Image? | Zone | Reasoning |
|---|
| Always | (full-bleed + 0.60-0.68 overlay) | Scroll-stop power: atmospheric image + typography > typography alone (Axiom 1, 3) |
| Never | -- | Text carries the weight; images destroy 40% content space for minimal gain |
| Preferred | (full-bleed + 0.55-0.65 overlay) | Gemini 3 Pro generates production-quality flowcharts with readable labels, arrows, and boxes -- far more visually striking than basic TikZ. TikZ remains as fallback for simple flows. |
| Never | -- | Numbered points ARE the content; keep text-only with gradient bg |
| Always | (full-bleed + 0.65-0.70 overlay) | Emotional close: atmospheric image creates a feeling of resolution |
Prompt Engineering for Consistency: All AI images in a single carousel MUST share a consistent style prefix. Build the prefix from the content vertical:
| Content Vertical | Style Prefix for AI Image Prompts |
|---|
| Mindset/Philosophy | "warm earthy tones, parchment cream, watercolor or classical art style, muted terracotta accents, editorial quality" |
| Tech/AI | "dark indigo and purple tones, subtle geometric patterns, clean digital art, neon accents, futuristic" |
| Business/Strategy | "warm amber and gold tones, bold professional graphics, rich depth, confident and energetic" |
| Education | "clean white and blue tones, flat illustration style, precise and clear, minimal and modern" |
| Creative/Design | "dark charcoal with bold accent colors, artistic and expressive, gallery quality, intentional composition" |
Text Readability is Inviolable: If using
(full-bleed), overlay opacity must ensure WCAG AA contrast (4.5:1). Minimum
. Proven ranges: hook 0.60-0.68, diagram 0.55-0.65, CTA 0.65-0.70.
What NOT to generate: Generic stock-photo-style images (people in offices, handshakes, generic landscapes). If the image could illustrate any topic, it fails the Telos Test.
AI Visual Generation (via generate-image skill)
Generate AI images for hook backgrounds, CTA backgrounds, and diagram visuals using the
skill (requires
):
bash
# Hook background -- cinematic, atmospheric, scroll-stopping
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Dramatic cinematic split-screen composition, glowing neon circuits on dark background, \
volumetric lighting, deep indigo and electric purple tones, no text, no words, no letters" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
# CTA background -- emotional close
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Abstract convergence of light streams on dark background, warm golden highlights, \
sense of resolution and completeness, cinematic atmosphere, no text, no words" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
# Diagram as AI image (preferred over TikZ for complex flows)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Professional flowchart: Data Collection box connects to Processing box connects to Output box, \
clean white boxes on dark blue background, arrows between nodes, minimal corporate design" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
Key rules: Always add "no text, no words, no letters" unless the image IS a diagram with labels. Use hyper-detailed prompts (50+ words) for best results.
Viral Hook Compositing Pipeline (PIL)
For viral-style hook slides matching accounts like @evolving.ai and @therundownai, use a two-step pipeline:
Step 1: Generate cinematic base image with Gemini 3 Pro (topic-specific, dramatic composition):
bash
# Multi-person composition (best for news/war/rivalry topics)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Cinematic photomontage: three powerful figures in dramatic formation, \
center figure is a humanoid AI robot with glowing eyes, flanking figures \
are business leaders in dark suits, red and blue dramatic lighting, \
dark moody background, editorial magazine composition, hyper-detailed" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
# Single portrait (best for profile/biography/interview topics)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Editorial portrait: distinguished elder with glasses, warm ambient lighting, \
slightly blurred conference background, shallow depth of field, \
photojournalistic style, natural expression, cinematic color grading" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
# Face-off composition (best for comparison/versus topics)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Dramatic face-off: two opposing figures in profile facing each other, \
one in cool blue lighting one in warm orange, city skyline between them, \
energy effects and particles, dark cinematic atmosphere, epic confrontation" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
Step 2a: News-editorial style (matches @therundownai -- single person, big headline):
bash
python3 scripts/compose_news_hook.py \
--base tmp/carousel/hook_base.png \
--output tmp/carousel/slide_01_hook.png \
--headline "OpenAI just hit $13B ARR making it the fastest-growing software company in history" \
--category "AI NEWS" \
--brand "@DailyAINews"
The
script (editorial style):
- Subtle bottom gradient (ease-in, configurable start/strength)
- Small category label above headline
- MASSIVE bold headline (Inter Black, auto-sized 42-72px to fill bottom 35%)
- Optional brand mark top-left
- Clean, minimal -- no slide counter, no CTA, no subhead
- Best for: single-person portrait + news headline
Step 2b: Multi-person viral style (compose_hook.py -- multi-person, full overlay):
bash
python3 scripts/compose_hook.py \
--base tmp/carousel/hook_base.png \
--output tmp/carousel/slide_01_hook.png \
--headline "THE AI WAR JUST ESCALATED" \
--subhead "3 moves that changed everything this week" \
--brand "YOUR BRAND" \
--category "AI NEWS"
The
script (viral style):
- Bottom gradient overlay (0-220 alpha, ease-in curve) for text readability
- Light top gradient for brand area
- Category label (upper-left, e.g., "AI NEWS")
- Brand watermark (centered)
- Word-wrapped bold headline (bottom area, all-caps)
- Optional subheadline
- "SWIPE FOR MORE" CTA with decorative line
- Slide counter (top-right, "1/8")
- Best for: multi-person compositions, face-off style
Prompt Strategy by Topic Type:
| Topic Type | Base Image Style | Score |
|---|
| News/current events | Multi-person photomontage + robot | 8.5/10 |
| Comparison/versus | Face-off composition with opposing energy | 8.5/10 |
| Profile/biography | Single editorial portrait | 8/10 |
| Tools/abstract | Silhouette with holographic/tech backdrop | 7.5/10 |
For educational/tutorial/framework topics, AI-generated compositions work excellently (8-8.5/10).
Real-Face Hook Pipeline (for news/current events topics)
When the topic involves specific real people (Sam Altman, Elon Musk, Jensen Huang, etc.), use web-sourced Creative Commons photos instead of AI generation:
BEST Approach: Base64 multi-image via AI Gateway (10/10)
Send local photos as base64 data URIs to
/api/v1/images/generations
. This bypasses URL accessibility issues (Wikimedia blocked, etc.) and supports ALL local images including 3+ people.
python
import base64, json, os
from pathlib import Path
from urllib import request
API_KEY = os.environ["AI_GATEWAY_API_KEY"]
BASE = "https://ai-gateway.happycapy.ai/api/v1" # NOT /openai/v1 !
# Load photos as base64 data URIs
images_b64 = []
for photo in ["elon_musk.jpg", "jensen_huang.jpg", "sam_altman.jpg"]:
data = base64.b64encode(Path(photo).read_bytes()).decode()
images_b64.append(f"data:image/jpeg;base64,{data}")
payload = {
"model": "google/gemini-3-pro-image-preview",
"prompt": "Create a dramatic face-off style composition with these three tech leaders. "
"Confrontational layout, intense red vs blue split lighting, dark background "
"with smoke/particle effects. Faces must remain photorealistic and recognizable.",
"images": images_b64,
"response_format": "url",
"n": 1
}
req = request.Request(
f"{BASE}/images/generations",
data=json.dumps(payload).encode(),
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
"Origin": "https://trickle.so"
},
method="POST"
)
with request.urlopen(req, timeout=180) as resp:
result = json.loads(resp.read())
img_url = result["data"][0]["url"]
# Download and save...
CRITICAL: Use
/api/v1/images/generations
(NOT
/api/v1/openai/v1/images/generations
). The OpenAI-prefixed endpoint rejects the
parameter.
Alternative: transform_image.py with Flickr URLs (9.5/10)
When photos are available at Flickr URLs (directly accessible by Vertex AI):
bash
python3 ~/.claude/skills/generate-image/scripts/transform_image.py \
"Create a dramatic cinematic photomontage combining these tech leaders. \
Dark dramatic background with blue and red lighting. Keep faces EXACTLY as they appear." \
"https://live.staticflickr.com/7832/33377877458_d1a3774615_b.jpg" \
"https://live.staticflickr.com/5767/30796823531_85932ecaa0_b.jpg" \
--model "google/gemini-3-pro-image-preview" \
--output tmp/carousel/hook_base.png
Photo sourcing rules:
- Use Creative Commons (CC BY 2.0+) photos from Flickr, Wikimedia Commons
- Flickr URLs accessible by Vertex AI; Wikimedia URLs often blocked
- Use with browser User-Agent for Wikimedia downloads to local files
- For local-only files (Wikimedia downloads), use the base64 approach above
- Include CC attribution in carousel caption
Fallback: PIL rembg composite (7/10)
bash
pip install rembg # One-time setup
# Remove backgrounds, composite onto AI background, apply compose_hook.py overlay
nano-banana-pro status: The native google-genai SDK requires GEMINI_API_KEY (not set). The AI Gateway has no Gemini-native endpoint, so routing the SDK through the gateway fails (404). The base64 approach above achieves the same multi-image composition capability via the AI Gateway's image generation endpoint.
PHASE 4: MUSIC SELECTION
Select from Instagram's available music library. Do NOT generate music. Apply the Music Decision Matrix to recommend 2-3 specific tracks the user can search for on Instagram.
PHASE 5: QUALITY REVIEW & EXPORT
Run the final checklist (see APPENDIX) against every slide. Re-render any slide that fails. Output:
- 7-10 slide PNG images at 1080x1350
- Caption text with hashtags
- Music recommendation (Instagram track names + artists)
- Posting notes (best time, engagement strategy)
THE 6 FOUNDATIONAL AXIOMS
Every decision in this skill traces back to these irreducible premises:
AXIOM 1: Attention is Finite and Contested
A human scrolling Instagram makes a stay-or-leave decision in ~1.3 seconds. The first slide is a survival test. Visual pattern interrupts trigger involuntary attention. Cognitive curiosity gaps (Zeigarnik effect) create forward momentum. The cost of starting to swipe is high; the cost of continuing is near-zero.
AXIOM 2: Value is the Only Sustainable Currency
Content that does not leave the viewer materially better off is noise. Save rate is the purest signal of value. Share rate = social currency. "Useful" is domain-specific.
AXIOM 3: Visual Cognition Precedes Textual Cognition
The brain processes visual information 60,000x faster than text. Color communicates emotion before words. Spatial hierarchy dictates reading order. Consistency creates cognitive fluency. One dominant visual per slide.
AXIOM 4: Narrative Arc is Hardwired
Content structured as narrative is retained 22x better than lists. Each slide must resolve the previous curiosity gap AND create the next one. The arc must reach genuine resolution.
AXIOM 5: The Medium Constrains and Enables
1080x1350 canvas on a 6-inch screen in half-attention. Minimum readable font = 24px. Bottom ~15% occluded by UI. Portrait (4:5) occupies maximum screen real estate.
AXIOM 6: Audio Creates Emotional Context
Music activates the limbic system independently. Instagram's algorithm rewards music usage with 15-30% more reach. Genre signals tribal identity. Trending audio boosts discovery if it genuinely fits.
THE 7 CAROUSEL ARCHETYPES
Auto-select the best archetype based on the topic. Each archetype has a specific slide structure, value test, and music profile.
1. TUTORIAL (How-To)
Slide 1: Problem statement (hook)
Slide 2: Tool/method introduction
Slide 3: Step 1 (with visual)
Slide 4: Step 2
Slide 5: Step 3
Slide 6: Step 4 (if needed)
Slide 7: Result / proof it works
Slide 8: Common mistakes to avoid
Slide 9: Quick-reference summary (save-worthy)
Slide 10: CTA
Value Test: Can the reader DO the thing after reading?
Music Profile: Lo-fi/chillhop, 70-85 BPM, instrumental
2. FRAMEWORK (Mental Model)
Slide 1: Common problem everyone faces (hook)
Slide 2: Why existing approaches fail
Slide 3: The framework name + overview
Slide 4: Component 1 explained
Slide 5: Component 2 explained
Slide 6: Component 3 explained
Slide 7: How the components connect (diagram)
Slide 8: Practical application example
Slide 9: The complete framework visual (save-worthy)
Slide 10: CTA
Value Test: Does the reader now have a reusable thinking tool?
Music Profile: Minimal electronic, 90-110 BPM, instrumental
3. MYTH-BUSTER (Contrarian Insight)
Slide 1: "Everyone thinks X" (hook)
Slide 2: "Here's what's actually happening"
Slide 3: Evidence 1
Slide 4: Evidence 2
Slide 5: Evidence 3
Slide 6: The real framework / truth
Slide 7: Implications
Slide 8: What to do instead
Slide 9: The mental model shift (save-worthy)
Slide 10: CTA
Value Test: Has the reader's mental model shifted?
Music Profile: Trip-hop/downtempo, 85-100 BPM, instrumental
4. CASE STUDY (Proof-Based)
Slide 1: The result / shocking metric (hook)
Slide 2: The context / starting point
Slide 3: What was done (overview)
Slide 4: Step 1 of the process
Slide 5: Step 2
Slide 6: Step 3
Slide 7: The data / proof
Slide 8: Key insight
Slide 9: How you can replicate it (save-worthy)
Slide 10: CTA
Value Test: Is the specific mechanism replicable?
Music Profile: Upbeat electronic, 110-120 BPM, light vocals OK
5. CURATED LIST (Resource Compilation)
Slide 1: "X Tools/Resources for Y" (hook)
Slide 2: Item 1 + why it's valuable
Slide 3: Item 2 + why
Slide 4: Item 3 + why
Slide 5: Item 4 + why
Slide 6: Item 5 + why
Slide 7: Item 6 + why (if needed)
Slide 8: Item 7 + why (if needed)
Slide 9: Comparison / selection guide (save-worthy)
Slide 10: CTA
Value Test: Can the reader immediately use at least 3 of these?
Music Profile: Chill beats/lo-fi, 75-90 BPM, instrumental
6. DEEP DIVE (Technical Explanation)
Slide 1: The concept + why it matters (hook)
Slide 2: What most people get wrong
Slide 3: How it actually works (simplified)
Slide 4: Visual diagram / mechanism
Slide 5: Practical example 1
Slide 6: Practical example 2
Slide 7: Common mistakes
Slide 8: Pro tips
Slide 9: The complete mental model (save-worthy)
Slide 10: CTA
Value Test: Does the reader understand the mechanism, not just the surface?
Music Profile: Ambient/atmospheric, 60-80 BPM, instrumental only
7. TRANSFORMATION (Before/After)
Slide 1: The "after" result (hook)
Slide 2: The "before" state / the pain
Slide 3: The discovery / turning point
Slide 4: The change in approach
Slide 5: Step 1 of the new way
Slide 6: Step 2
Slide 7: Step 3
Slide 8: The complete "after" state with proof
Slide 9: How to start your transformation (save-worthy)
Slide 10: CTA
Value Test: Can the reader see themselves in the transformation?
Music Profile: Progressive/building, 80-120 BPM arc, light vocals OK
THE 6 HOOK PATTERNS
The first slide determines everything. Select the best hook pattern for the topic:
1. The Curiosity Gap
"Claude Code has a memory problem. Here's how to fix it for free."
States a problem the audience recognizes + promises a solution. Optionally removes an objection ("for free", "in 5 minutes").
2. The Contrarian Statement
"Stop using RAG. There's a better way."
Contradicts a common belief. Creates cognitive dissonance that demands resolution.
3. The Specific Result
"This setup saved me 4 hours per week of prompt debugging."
Concrete numbers bypass the vague-promise filter. Specificity = credibility.
4. The Analogy Bridge
"Your AI agent's memory works like a messy desk. Here's how to organize it."
Maps unfamiliar onto familiar. Creates instant comprehension.
5. The "You're Doing It Wrong"
"90% of developers use Claude Code wrong. Are you one of them?"
Identity-based challenge. Use sparingly -- dangerous if overused.
6. The Stack / Combination
"Obsidian + Claude Code = unlimited AI memory"
Two known things combined unexpectedly. The "+" implies synergy.
THE BULLSHIT TEST (Mandatory Quality Gate)
Every single slide must pass ALL 3 conditions before rendering. No exceptions.
Condition 1: SPECIFICITY
Does this contain a concrete, actionable insight that could NOT be guessed by someone with zero domain knowledge?
- FAIL: "Use the right tools for the job"
- PASS: "Obsidian's graph view lets Claude Code traverse 10x more documents by following wiki-links between markdown files"
Condition 2: NOVELTY
Does this present a connection, framework, or technique the viewer has likely NOT encountered before?
- FAIL: "AI is changing the world"
- PASS: "By creating bidirectional links between your docs, you turn Claude Code's context window into a navigation system instead of a storage container"
Condition 3: DENSITY
Could the same information be compressed further without loss of meaning? If yes, it is padded and needs to be tightened.
- FAIL: "There are many benefits to using this approach, including several key advantages that make it worthwhile"
- PASS: "3 benefits: 10x doc navigation, auto-linked memory, zero-config setup"
If a slide fails any condition, rewrite it before rendering.
VISUAL DESIGN SYSTEM
Typography Hierarchy
| Element | Size | Weight | Font Type |
|---|
| Slide Title | 64-80px | Bold/Black (700-900) | Strong serif OR geometric sans |
| Subtitle / Hook | 32-40px | SemiBold (600) | Same family as title |
| Body Text | 24-28px | Regular (400) | Clean sans-serif |
| Bullet Points | 22-26px | Regular (400) | Same as body |
| Labels / Citations | 16-20px | Light (300) | Same as body |
| Slide Indicator | 14-16px | Light (300) | Sans-serif |
Rules:
- Maximum 2 fonts per carousel
- Title font and body font must pair well
- NEVER go below 24px for any text the reader must understand
- Consistent across ALL slides
Color Palettes by Content Vertical
Tech / AI / Coding:
- Background: (deep dark) or (midnight blue)
- Primary text: (near-white) or
- Accent: (electric purple) or (bright blue)
- Secondary: (muted gray)
Business / Strategy:
- Background: Linear gradient to (warm amber) or (cream)
- Primary text: (near-black)
- Accent: (confident red) or (gold)
- Secondary: (warm gray)
Education / How-To:
- Background: (clean white) or (cool off-white)
- Primary text: (dark slate)
- Accent: (trust blue) or (sky blue)
- Secondary: (slate gray)
Design / Creative:
- Background: (charcoal) or (near-white)
- Primary text: Inverse of background
- Accent: ONE bold color ( magenta, emerald, or amber)
- Secondary: (zinc)
Mindset / Growth:
- Background: (warm neutral) or (forest dark)
- Primary text: (earth brown) or (warm light)
- Accent: (forest green) or (amber earth)
- Secondary: (warm mid-tone)
Layout Rules
- Canvas: 1080 x 1350 px (4:5 portrait) -- ALWAYS
- Margins: 60px minimum on all sides
- Safe Zone: Center 80% (top/bottom 10% may be occluded by Instagram UI)
- One idea per slide: If a slide has two ideas, split it into two slides
- Visual anchor: Every slide needs ONE dominant visual element
- Breathing room: Content should never feel cramped -- generous whitespace signals quality
MUSIC SELECTION (Instagram Library)
Do NOT generate music. Recommend specific tracks available on Instagram's music library.
Music Decision Matrix
| Content Type | Search Keywords on Instagram | BPM Range | Vocals | Example Tracks to Search |
|---|
| Tech / AI | "lo-fi", "chill beats", "trip-hop" | 70-90 | No | DJ Shadow - Six Days, Nujabes - Aruarian Dance, Tycho - A Walk, Bonobo - Kerala |
| Business | "indie electronic", "future bass" | 100-120 | Minimal | ODESZA - A Moment Apart, Rufus Du Sol - Innerbloom, Bicep - Glue |
| Tutorial | "study beats", "chillhop", "acoustic" | 75-95 | No | Idealism - Lovely Day, Jinsang - Solitude, Tomppabeats - Monday Loop |
| Motivational | "epic", "cinematic", "uplifting" | 110-130 | Optional | M83 - Midnight City, Hans Zimmer - Time, Illenium - Good Things Fall Apart |
| Creative | "minimal techno", "ambient", "art" | 90-115 | No | Four Tet - Two Thousand and Seventeen, Jon Hopkins - Emerald Rush, Kiasmos - Blurred |
| Myth-Buster | "dark ambient", "post-rock", "mysterious" | 80-100 | No | Massive Attack - Teardrop, Radiohead - Everything In Its Right Place, Portishead - Wandering Star |
| Case Study | "upbeat", "indie pop", "electronic" | 110-125 | Light | Washed Out - Feel It All Around, Toro y Moi - So Many Details, M83 - Wait |
Music Selection Rules
- Text-heavy carousels: ALWAYS instrumental only (vocals compete with reading)
- Visual-heavy carousels: Vocals acceptable (separate processing channels)
- Trending audio: Use ONLY if it genuinely fits the content type. Mismatched trending sounds damage authenticity
- Trending audio lifecycle: Discovery (Day 0-3, max boost) -> Growth (Day 3-14, good) -> Peak (Day 14-30, OK) -> Saturation (Day 30+, skip)
- Output format: Provide 2-3 track recommendations with artist name, track name, and why it fits
CAPTION TEMPLATE
[Hook line -- front-load value, must be compelling in first 2 lines before "...more"]
[2-3 sentences expanding the core value proposition]
[Key points:]
- Point 1 (specific, not vague)
- Point 2
- Point 3
[Specific CTA -- NOT "What do you think?" but rather a specific question or action]
[5-15 hashtags with distribution:]
[2-3 broad (100K-1M posts)] [3-5 niche (10K-100K)] [2-3 community (1K-10K)] [1-2 branded]
INSTAGRAM ALGORITHM OPTIMIZATION
- Save Rate is the #1 signal. Design every carousel to be save-worthy. Include a synthesis/mental-model slide.
- 10-slide carousels outperform shorter ones by ~30% in save rate
- Dwell time: More slides = more time on post = algorithm reward
- Music adds ~15-30% reach boost
- Re-engagement: Instagram re-shows carousels to users who did not swipe all the way through
- First hour: Posts saved within the first hour get exponential distribution
- Hashtags: Put in caption (not first comment). 5-15 total.
RENDERING SCRIPTS
render_latex_slide.py (PRIMARY RENDERER)
Publication-grade LaTeX slide renderer. Produces 1080x1350 PNG slides using pdflatex + pdftoppm.
6 slide types:
,
,
,
,
,
4 themes:
,
,
,
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type body \
--data body_data.json \
--output slide.png \
--theme dark \
--brand brand_config.json
Data fields by slide type:
- hook: , , , , , ,
- body: , , , ,
- comparison: , ,
columns[{name, items[{label, value}]}]
,
- diagram: , ,
diagram_nodes[{label, desc}]
, (vertical/horizontal), ,
- synthesis: , ,
- cta: , , , , ,
- All types: , , , (full-bleed background), ,
generate_carousel.py (ORCHESTRATOR)
End-to-end carousel generation from a JSON spec. Handles slide numbering, rendering, and preview grid assembly.
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
--spec carousel_spec.json \
--output-dir outputs/carousel/ \
--brand brand_config.json
AI Image Generation (via generate-image skill)
Use the
skill for all AI images (hook bg, CTA bg, diagram bg). See "AI Visual Generation" section above for examples.
bash
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Your detailed prompt here, 50+ words, no text no words no letters" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/image.png
assemble_carousel.py (ASSEMBLY)
Validates 1080x1350, optimizes PNGs, creates preview grid, generates metadata JSON.
bash
python3 ~/.claude/skills/world-class-carousel/scripts/assemble_carousel.py \
--input-dir tmp/carousel/ --output-dir outputs/carousel/ --optimize
render_slide.py (LEGACY - Pillow-based)
Pillow-based renderer with 6 layout modes. Superseded by
for production use. Still available for quick prototyping without LaTeX dependencies.
10 WORLD-CLASS DIFFERENTIATORS
Apply these to elevate from "good" to "world-class":
- Intellectual Density: One INSIGHT per slide, not just one idea (insight = non-obvious connection between two known things)
- Visual Craftsmanship: Every pixel intentional. Margins mathematical. Colors from a system.
- Hook Specificity: "I tested 1,247 prompts across 6 models" not "5 Tips for Better Prompts"
- Narrative Completeness: Each slide creates a question the next answers. Final slide ties back to hook.
- Proof Over Claims: Screenshots, before/after comparisons, specific metrics -- not "this is great"
- Typography as Design: The way words are sized, spaced, and placed tells the story VISUALLY
- Strategic Restraint: Know what to leave OUT. Negative space is a design choice.
- Music-Content Resonance: BPM matches reading pace. Genre signals the tribe.
- Save-Worthy Synthesis: Last content slide is a mental model / framework diagram worth saving
- Authentic Voice: Written as one expert talking to a colleague. Never "content creator voice."
FINAL CHECKLIST
Before delivering any carousel, verify ALL of these:
PHASE 6: LEARNING PROTOCOL (Post-Delivery)
After every carousel delivery, update the skill's knowledge base. This system prevents repeating mistakes while staying compact.
Two-Tier Memory Architecture
Tier 1: (in this skill directory)
- MAX 60 lines. Contains ONLY compressed, actionable rules.
- Format: one-line rules grouped by category. No narratives, no session history.
- When adding a new rule: check if it supersedes an existing rule. If yes, REPLACE the old rule. Never append duplicates.
- Read this file at the START of every carousel session to avoid known pitfalls.
Tier 2: directory (in this skill directory)
- Verbose session logs go here as timestamped files:
session-archives/YYYY-MM-DD-topic.md
- Include: full experiment data, scoring matrices, debug traces, before/after comparisons.
- These files are NEVER loaded into context unless explicitly requested by the user.
- They exist as raw data for future deep-dives, not as operational knowledge.
After Every Session
- Check KNOWN_ISSUES.md -- Does this session reveal a new rule? Add it (max 1 line). Does it supersede an old rule? Replace it.
- Archive verbose data -- If the session involved experiments, debugging, or research, write a session archive file.
- Compress, don't accumulate -- The goal is a fixed-size knowledge base that gets BETTER over time, not BIGGER.
The Compression Principle
Every piece of learning must be compressed to its irreducible form before entering Tier 1:
- BAD: "In session on March 10, we discovered that passing synthesis points as dicts causes an AttributeError because the renderer at line 870 does escape_latex(pt) directly on each point" (38 words)
- GOOD: "Synthesis must be FLAT STRINGS, not dicts. Renderer does directly." (12 words)
If you can't compress it to one line, it belongs in Tier 2 (session archive), not Tier 1.