Image to Editable PPT
Overview
This skill is used to reconstruct visual slide inputs into object-level editable PowerPoint
files.
Inputs can be a single image, multiple images, PDF files, or image-based PPT/PPTX files. The output is always
. The goal is not to embed full-page screenshots into PPT, but to make readable text into native PowerPoint text boxes as much as possible, basic structures into native shapes, complex visual elements into independent image assets, and use manifests, previews, and validation reports to ensure results are inspectable and reworkable.
Default trade-off: Prioritize object-level editability. It is better to have slightly rough visuals than to use full-page raster images to pretend to be editable PPT.
Hard Constraints
- Every source page must be reconstructed by a page subagent, including single-image inputs.
- The main agent does not perform page reconstruction, only orchestration.
- Do not design a parent agent to execute alone, perform sequential degradation, or use low-fidelity degradation mode. Stop if no available page subagent is present, do not proceed to page reconstruction.
- All image generation, image modification, background repair, transparent bitmap assets, and asset sheets must use the skill.
- The default path for is the built-in . Do not directly call Image APIs in this skill.
- If a page requires but or the built-in is unavailable, stop processing this page and report the blocker, do not forge assets.
- Overlaying editable text on the original full-page is a failure mode, not a fallback.
- Only basic primitives and simple structural objects can use native PPT shapes. For non-text visual objects with uncertainty, default to using to redraw them into independent assets.
- Page workers must first calibrate the checklist of text font sizes, visual objects, background strategies, and shape corners before writing the manifest; cannot rely on aesthetic guesses for default font sizes or default rounded corners.
- Key states can only be advanced by scripts. Agents cannot manually write JSON to mark pages, imagegen jobs, or runs as completed.
Visible Progress Plan
During normal operation, the main agent must maintain a user-visible checklist with only one active step at a time:
- Prepare input and task directories.
- Dispatch page reconstruction tasks.
- Reconstruct page objects.
- Inspect and repair pages.
- Assemble and validate PPTX.
Completion criteria:
Prepare input and task directories
: , , pages/page_NNN/source.png
, exist.
Dispatch page reconstruction tasks
: The main agent spawns page subagents in batches according to ; each spawned page is recorded as dispatched by . If subagents cannot be spawned continuously, stop here and report the blocker.
- : Each page produces , , , , , by the page worker.
- : All pages are recorded by , and the repair queue is cleared; report blockers if repairs are not possible.
Assemble and validate PPTX
: final/<origin>_edited.pptx
and exist.
Do not mark steps as completed just because it is stated in the chat; real files or script-advanced states are required.
Default Workflow
- Run to create the run directory, normalize inputs, and generate deck/page manifests and page requests.
- Run to view pending dispatch pages, active dispatches, and available dispatch slots.
- The main agent spawns regular Codex worker subagents in batches according to ; do not spawn more than the runtime concurrency limit at once.
- Immediately run to record the dispatch after spawning.
- Each page worker only works within its own page directory, completing page-level build, preview, contact sheet, and validation.
- After the page worker returns, the main agent runs to check files, paths, and hashes, and advance the page state.
- Run again; if there are pending/repair_needed pages, continue dispatching the next batch.
- If there are page issues, run to generate repair items, then dispatch repair workers in batches.
- After all pages are accepted, run to assemble the final PPTX, copy notes, and run deck validation and QA summary.
The normal main entry is
. Do not retain old input normalization scripts as public entries or compatibility wrappers.
Generation Delegation
Before using
, you must read and comply with:
text
${CODEX_HOME:-$HOME/.codex}/skills/.system/imagegen/SKILL.md
This skill only combines
and does not redefine image generation API rules.
Common scenarios requiring
within a page:
- Text, icons, or foreground objects on complex backgrounds, requiring source-preserving foreground removal + localized background restoration.
- Icons, pictograms, badges, stickers, hand-drawn marks, stylized arrows, decorative symbols, etc., that need to be treated as independent assets.
- Need to generate chroma-key asset sheets, then perform local background removal, segmentation, and transparency processing.
- Need targeted repair of a clean base or a foreground asset.
Generated images actually used in the project must be copied to the page directory and recorded via the page-local
. Do not let the manifest reference images that only exist in
$CODEX_HOME/generated_images/...
.
Complex backgrounds default to retaining source identity. You can use
to generate a clean background, but must use the source as the edit target and strong constraint reference for repair/reconstruction; cannot generate a new background that is "similar but different". Prioritize local repair when occlusion is minor; when occlusion is severe or a full clean base is needed, must retain the original composition, perspective, main object positions, colors, lighting, and background details, and record the fidelity strategy in
of the manifest.
Subagent Dispatch
Page subagents are the only executors of page reconstruction. The main agent does not reconstruct pages.
Each page worker must receive a self-contained prompt that includes at least:
- Absolute paths of run dir, page id, page dir, and source image.
- Allowed write scope: Only the current page dir can be written to.
- Prohibited write scope: Deck manifest, notes manifest, final deck, other pages.
- Required references: , , , .
- Required reading of .
- Required outputs and return format.
The page worker prompt template is in
.
If page subagents cannot be spawned, stop and report the blocker. Do not perform sequential page reconstruction.
Rules
- Text: All readable text should become visible native PPT text boxes. Hidden, transparent, 1 pt, off-canvas, or metadata-only text does not count as editable text.
- Font size: First estimate based on source glyph height, container height, and peer density, then compare with preview for scaling; when uncertain, use a smaller size rather than a larger one. The manifest must record
quality_checks.font_size_calibrated=true
.
- Structure: Only basic primitives can use native PPT shapes, such as lines, rectangles, circles, table lines, axes, simple bar blocks, and basic containers. Rectangular containers default to using ; only use if the source clearly has rounded corners, and record or .
- Foreground: Icons, pictograms, logo-like marks, hand-drawn marks, stickers, badges, complex arrows, and decorative elements default to being redrawn into independent assets using . If small visual objects in the source image require high consistency, have no readable text, and redrawing would change their identity, they can be cropped as independent source-derived raster assets and their source areas recorded; this is not a full-page screenshot fallback.
- Background: Solid colors, simple gradients, regular grids, and ordinary cards can be reconstructed natively or via scripts; when complex photos, textures, or illustrations are occluded by foreground objects, use for inpainting/restoration.
- Asset sheet: Default to using sparse chroma-key asset sheets to reduce the number of image generation times. Prioritize element spacing; cannot be crowded, stuck together, or have overlapping projections.
- Provenance: Each final raster asset must have a source record. Do not treat original source crops as default visual assets.
- QA: Deterministic validation is necessary but not sufficient. Must check and .
- Repair: Repair the minimum failed scope. Do not reconstruct the entire page for a single text box or icon.
- State: Key states in and must be advanced by scripts.
Acceptance Criteria
- The output is a valid file.
- Single image input outputs 1 page; multiple images output 1 page per image; page N of PDF corresponds to page N of output; page N of PPT/PPTX corresponds to page N of output.
- PPT/PPTX speaker notes are copied as-is per page, without translation, summarization, or rewriting by page workers.
- Each page has , , , , , .
- Each page has records of source image size, text inventory, object decisions, asset provenance, and known limits.
- Each page is recorded as dispatched by and the result is recorded by .
- The final deck has
final/<origin>_edited.pptx
and .
- If a blocker occurs, the final response must explain the blocker stage, evidence path, and reason for incompletion; cannot be called a normal completion.
Reference Map
references/architecture.md
: Responsibility boundaries, run/page directory structure, owner principles.
references/state-machine.md
: Run/page/imagegen state machine and script advancement rules.
references/subagent-contract.md
: Prompt contracts and return formats for page workers and repair workers.
references/imagegen-integration.md
: How to combine , including clean base, asset sheet, transparency processing, and recording.
references/page-decision-tree.md
: Page analysis, background strategies, foreground/structural object boundaries.
references/manifest-schema.md
: First version of deck/page/imagegen JSON schema.
- : Structural, text, asset, background, and visual QA standards.
references/repair-policy.md
: Repair queue, minimum rework scope, and blocker determination.
references/script-contracts.md
: Script responsibilities, inputs/outputs, and allowed callers.
- : Prompt for regular page reconstruction workers.
prompts/page-repair-worker.md
: Prompt for page repair workers.
prompts/imagegen-clean-base.md
: Prompt for clean base generation/editing.
prompts/imagegen-asset-sheet.md
: Prompt for sparse asset sheet generation.
prompts/imagegen-repair.md
: Prompt for targeted imagegen repair.