EmojiGen Nano Banana
Use this skill to reproduce the EmojiGen Pro workflow as a reusable agent workflow instead of a browser app.
Read this skill end to end before you start work. Do not jump straight to writing a config, building a prompt, or calling a model until you have read the SOP and decided how you will satisfy every step.
What to collect before doing work
Do not start generation until you have either explicit answers or safe defaults for:
- Reference image path.
- Output mode: or .
- Emotion list, or a category prompt that can be expanded into emotions.
- Style target, such as , , .
- Optional custom text and color.
- Output directory.
- Backend choice:
- Gemini Developer API via , , or
- Vertex AI via
GOOGLE_GENAI_USE_VERTEXAI=true
plus and
- Another image tool chosen by the agent when Gemini access is unavailable
Before generation, inspect the current reference image and rewrite
for this exact subject. Never reuse stale
,
, or style notes from a previous run on a different person.
If the user is adapting the original EmojiGen Pro repository, first reconstruct the workflow from the codebase before you rewrite anything. Preserve the original sequence:
- Collect or generate emotion labels.
- Assemble one long prompt for a strict 4x6 sticker sheet.
- Generate the sheet image from the reference image.
- Slice the sheet into frames or stickers.
- Encode GIFs for animated mode.
Default decisions
- Only use these image models:
- ->
gemini-3-pro-image-preview
- ->
gemini-3.1-flash-image-preview
- Default to unless the user explicitly asks for .
- Default style:
- Default :
- Random emotions should be generated by the agent locally by default. Do not depend on a Gemini text model unless the user explicitly wants model-generated wording.
- Keep count constraints hard:
- static mode always resolves to exactly stickers
- animated mode only allows , , or GIFs
- Force image generation settings to:
- Keep the output contract stable even if image generation uses a fallback tool:
Working sequence
0. Stage the source image when the path is unstable
If the image came from the clipboard, a pasted chat image, or any source whose original path is unreliable, save it into
first:
bash
node skills/emojigen-nano-banana/scripts/emojigen.mjs stage-image \
--from-clipboard
Or copy a known file into
so later steps use a stable path:
bash
node skills/emojigen-nano-banana/scripts/emojigen.mjs stage-image \
--input /abs/path/to/source.png
Use the staged path for all later steps.
1. Prepare config
Start from
assets/example-config.json
. Fill only the fields needed for the current task.
If the user did not give an emotion list, leave
empty and provide
.
Then:
- infer a category prompt from the request and let the agent produce the random emotions directly, or
- only if the user explicitly wants model-generated wording, run:
bash
node skills/emojigen-nano-banana/scripts/emojigen.mjs suggest-emotions \
--category "职场打工人, 加班, 摸鱼, 收到, 崩溃, 阴阳怪气" \
--count 4
2. Run preflight before generation
Preflight checks the backend, confirms the staged reference path, and resolves missing random emotions without starting image generation:
bash
node skills/emojigen-nano-banana/scripts/emojigen.mjs preflight \
--config path/to/config.json \
--reference /tmp/emojigen-input-123.png
3. Build the prompt
Always build the prompt through the script so the wording stays consistent:
bash
node skills/emojigen-nano-banana/scripts/emojigen.mjs build-prompt \
--config path/to/config.json \
--out path/to/output/prompt.txt
Do not stop here.
is not the delivery workflow.
4. Generate the 4x6 grid
If Gemini or Vertex AI is available, prefer the built-in generator:
bash
node skills/emojigen-nano-banana/scripts/emojigen.mjs generate-grid \
--config path/to/config.json \
--reference path/to/reference.png \
--out path/to/output/grid.png
The script rejects image models outside
and
, and always sends
+
.
Do not take
and call a raw image model yourself when the built-in workflow is available. That bypasses the skill's staging, preflight, slicing, background-removal, and quality gates.
If another image tool is a better fit, still use this skill. Build the prompt with this skill, generate the grid elsewhere, then continue with
.
5. Produce GIFs or static stickers
If you already have a grid image, run:
bash
node skills/emojigen-nano-banana/scripts/emojigen.mjs make-assets \
--config path/to/config.json \
--grid path/to/output/grid.png \
--out-dir path/to/output
This creates square crops, optional background removal, and GIF outputs for animated mode.
Keep
by default. Only enable background removal when the user explicitly wants transparent stickers and the generated sheet clearly uses a flat, separable background.
Read
after
or
. If
is
, do not deliver the result yet. Rerun with stricter
, stronger square-safe composition constraints, or
.
Background removal uses a corner-connected flood-fill strategy. This is safer than making every near-background color transparent, and avoids punching holes in faces or clothing when skin tones are similar to the background.
Treat square-safe composition as a hard requirement, not a style preference. The final assets are cropped to square cells, so the subject must stay centered and stable across frames or the GIF will jitter after slicing.
6. Full end-to-end run
When no step needs manual intervention, use the orchestration command:
bash
node skills/emojigen-nano-banana/scripts/emojigen.mjs run \
--config path/to/config.json \
--reference path/to/reference.png \
--out-dir /tmp/emojigen-run \
--deliver-dir path/to/workspace-output \
--cleanup-temp
Use
to copy the finished assets into the working directory or a client delivery folder.
Use
after delivery when the outputs were generated under
. macOS may eventually clear
, but not immediately enough for agent workflows.
Treat this as the preferred path. The default expectation is:
- inspect
- deliver only if quality is acceptable
Do not skip any of these steps unless the user explicitly narrows the task and you can still preserve output quality.
Fallback rules
- If no Gemini credentials are present, say that explicitly and either ask for credentials or use another image-capable tool.
- If another tool generated the grid, say that the final GIF packaging still came from this skill.
- If background removal damages line art or text, rerun with and keep the pure solid background from prompt-time constraints.
- Do not proactively enable background removal just because the script supports it.
- If the input image arrived as a pasted or clipboard image, stage it to before any prompt or generation step.
- If the user only asked for random emotions, do not call a text model by default. Generate them directly unless the user explicitly wants a model to brainstorm them.
References
- Read for CLI usage, environment variable precedence, and output layout.
- Read
references/model-backends.md
when choosing between Gemini API and Vertex AI.