Present — Narrated Interactive Presentations
Generate a self-contained HTML presentation with dual article/slides mode, ElevenLabs narration, optional GPT Image 2 illustrations, and scroll-reveal animations.
What This Skill Produces
A single
file (plus audio and optional image assets) that can be:
- Opened locally in a browser
- Deployed to Vercel, Netlify, or any static host
- Shared as a folder
The output has two modes the viewer can toggle between:
- Article mode — long-form scrollable report with Tufte-inspired typography
- Slides mode — navigable presentation with keyboard/click navigation and narrated audio playback
Quick Start
/present "AI adoption research for Arseny" --slides 12 --voice daniel --images risograph
Or with a file:
/present path/to/research.md --detail detailed --voice alice
Parameters
| Parameter | Values | Default | Description |
|---|
| 5-20 | 12 | Number of slides |
| , , | | Content depth |
| ElevenLabs voice name | | Narrator voice |
| style name or | | Image generation style |
| custom string | auto | Override image prompt prefix |
| path | | Output directory |
| vercel project or | | Auto-deploy target |
| string | auto | Presentation title |
| flag | false | Skip audio generation |
Detail Levels
- (5-7 slides): Key findings only. One stat slide, one recommendation slide, sources. Best for busy stakeholders who need the bottom line.
- (10-14 slides): Full narrative arc. Problem, evidence, analysis, recommendations, sources. The default for most presentations.
- (15-20 slides): Deep dive. Includes methodology, multiple evidence sections, case studies, detailed recommendations with implementation steps.
Voice Options
Uses ElevenLabs API. The key must be available in
~/claude-skills/elevenlabs-tts/.env
as
.
Recommended voices for presentations:
- daniel — Steady Broadcaster, British, formal (default)
- alice — Clear Educator, British, professional
- matilda — Knowledgeable, American, upbeat
- brian — Deep Resonant, American, comforting
- george — Warm Storyteller, British, mature
Image Styles
When
is set, the skill generates illustrations for key slides using GPT Image 2 (
~/.claude/skills/gpt-image-2/scripts/gpt_image_2.py
). Available styles:
- — Gerd Arntz isotype style, muted colors, sand texture
- — Magazine photography style, dramatic lighting
- — Technical drawing aesthetic, white on blue
- — Black ink illustration, hand-drawn feel
- — Data visualization aesthetic, dots and lines
- Custom: pass
--image-prompt "your style description"
to override
Images are generated in
mode first (~$0.006/image). The skill decides which slides benefit from illustration (typically 3-5 out of 12).
Workflow
Step 1: Content Analysis
Read the input content (a topic description, a markdown file, vault notes, meeting transcript, or research). Identify:
- The core argument or narrative
- Key data points and statistics
- Natural section breaks
- Quotable findings with sources
Step 2: Slide Planning
Based on
and
, create a slide plan. Each slide needs:
Slide N: [Type] — [Title]
Content: [what appears on screen]
Narration: [what the voice says — always more than what's on screen]
Read time: [seconds for an average reader to absorb the visual content]
Image: [yes/no, with prompt if yes]
Slide types:
,
,
,
,
,
,
,
,
,
The narration script should be conversational and add context beyond what's displayed. It should NOT just read the slide text aloud — it should explain, connect, and elaborate. Target 15-30 seconds of narration per slide.
Step 3: Generate Audio
For each slide, generate narration using ElevenLabs:
bash
python3 ~/.claude/skills/elevenlabs-tts/scripts/elevenlabs_tts.py \
--voice <voice_name> \
--text "<narration>" \
--output <output_dir>/audio/slide-<N>.mp3
Or use the direct API via the script at
scripts/generate_audio.py
in this skill.
Also generate a transition sound (Rhodes chord) for slide-to-slide transitions.
After generation, get durations with ffprobe to calculate slide timing.
Step 4: Generate Images (if enabled)
For slides that benefit from illustration, generate images using GPT Image 2:
bash
python3 ~/.claude/skills/gpt-image-2/scripts/gpt_image_2.py --draft --size 1536x1024 \
"<style prefix> <slide-specific prompt>" \
<output_dir>/images/<name>.png
Typically generate 3-5 images for a 12-slide deck. Choose slides where a visual metaphor strengthens the point — stat slides, concept slides, and the title slide are good candidates. Don't illustrate every slide.
Step 5: Build HTML
Use the template at
as the base. The template includes:
- Typography: EB Garamond (body) + DM Sans (labels/numbers)
- Color palette: Configurable via CSS variables in
- Article mode: Tufte-inspired layout with executive summary box, stat cards, two-column sections, data tables
- Slides mode: Full-viewport slides with fade transitions, keyboard navigation (arrows, space), dot indicators
- Audio engine: Single reusable element, slide-synced playback with progress bar, transition sounds between slides
- Auto-hide controls: Top bar (mode switcher + audio) appears when cursor enters top 20% of viewport. Bottom nav appears in bottom 20%. Shift+. toggles always-show/always-hide/zone mode.
- Scroll-reveal animations: Intersection Observer-based fade-up for sections, staggered stat cards, animated counters, h2 rule-draw effect
- : All animations disabled when user prefers reduced motion
Populate the template by replacing placeholder sections with the actual slide and article content.
Step 6: Test
Open in browser using
or
. Verify:
Step 7: Deploy (if requested)
If
is set, copy output to the target project's
folder and deploy:
bash
cp -r <output_dir>/* <project_path>/public/<slug>/
cd <project_path> && vercel deploy --prod --yes
HTML Architecture
Audio Sync Model
Each slide has three timing properties:
- — maps to audio file
- — seconds for reading the visual content
The audio engine calculates:
slide_duration = max(audio_duration, read_time) + 2s
. After narration ends, it waits for any remaining read time plus a 2-second buffer, plays a transition sound (1.8s), then advances to the next slide.
Avoiding AI-Looking Formatting
The following patterns read as AI-generated and should be avoided:
- Colored left-bar + bold heading + description blocks (finding cards)
- Large italic pull quotes with colored left border
- Uniform card grids with icon + heading + description
- Gradient text on metrics
Instead use:
- Natural prose paragraphs with inline emphasis
- Definition lists () for structured points
- Tables for comparisons
- Direct statements woven into flowing text
Image Paths
Use absolute paths from the deployment root:
, not relative paths. Relative paths break when URLs load without trailing slashes.
Files
- — This file
scripts/generate_audio.py
— ElevenLabs TTS batch generator
- — Base HTML template with all CSS/JS
references/slide-types.md
— Detailed slide type specifications and examples