Image Generation and Editing (Official Formal Version of GPT Image 2)
Based on the official formal version of the GPT Image 2 model (gpt-image-2) from Apiyi Platform, this image generation skill can help users generate images through natural language. It is accessed via Apiyi's domestic proxy service and supports both Node.js and Python runtime environments. gpt-image-2 is the official formal version of GPT image generation model on Apiyi Platform, supporting precise size/quality control (including 4K) and billed by token.
Usage Guide
Follow these steps:
Step 1: Analyze Requirements and Extract Parameters
-
Clarify Intent: Distinguish whether the user needs [Text-to-Image] (generate new images), [Image-to-Image] (edit/modify existing images), or [Multi-Image Fusion].
-
Prompt Analysis:
- Use the user's original complete input: Directly use the user's original full question and requirement description as the main body of the prompt. Avoid rewriting, summarizing or secondary creation on your own to prevent loss of details.
- Confirm first when supplementation is needed: If information is insufficient (e.g., missing style, number of subjects, shot language, scene details, text content, prohibited elements, etc.), ask the user for confirmation first; after the user confirms, append the supplementary content to the original prompt in an "appending" manner.
- Examples:
- User input: "Help me generate a picture of a cat, in a cute style."
- Correct example: Use the user's input directly as the prompt:
-p "Help me generate a picture of a cat, in a cute style."
- Incorrect example: Unauthorized rewriting to "Generate a picture of a cat in cute style" will lose the details and tone of the user's original input.
- If additional details are needed (e.g., color, background, etc.), ask for confirmation first: "What color do you want the cat to be? Any requirements for the background?" After the user replies, append it to the prompt:
-p "Help me generate a picture of a cat, in a cute style. The cat is orange, and the background is grass."
-
Key Parameter Organization:
- Prompt (Required): The final prompt after analysis (default = the user's original complete and consistent input; only append supplementary information after user confirmation).
- Filename (Optional): Output image filename/path (must include a random identifier to avoid duplication). If not provided, the script will automatically generate a filename with a timestamp. It is recommended to generate a reasonable filename based on content (e.g., ) instead of using generic names.
- Size (Optional): Output size.
- Preset values: , , , , , ,
- Custom sizes are also allowed (requirements: maximum side ≤3840, both sides are multiples of 16, aspect ratio ≤3:1, total pixels 0.65–8.3MP)
- Default is model-adaptive (auto)
- Quality (Optional): Quality level. (sketch/batch), (daily use), (final draft/fine text), (default)
- Output Format (Optional): (default), ,
- Output Compression (Optional): Output compression rate (0-100), only valid for jpeg/webp
- Note: This model uses official formal endpoints, which are different from the reverse version gpt-image-2-all.
Step 2: Environment Check and Command Execution
-
Check Environment: Confirm whether the
environment variable is set (usually assumed to be set; prompt the user if execution fails).
-
Build and Run Commands:
- Priority Node.js Version: If Node is available in the environment (the command works), prefer using
scripts/generate_image.js
(zero dependencies, parameters are consistent with Python).
- Use Python Version if Node is Unavailable: Use
scripts/generate_image.py
.
Text-to-Image Command Template (Priority Node.js):
bash
node scripts/generate_image.js -p "{prompt}" -f "{filename}" [-s {size}] [-q {quality}] [-o {output_format}]
Image-to-Image Command Template (Priority Node.js):
bash
node scripts/generate_image.js -p "{edit_instruction}" -i "{input_path}" -f "{output_filename}" [-s {size}] [-q {quality}]
Multi-Image Fusion Command Template (Priority Node.js):
bash
node scripts/generate_image.js -p "Merge the styles of Image 1 and Image 2" -i ref1.png ref2.png -f "merged.png" [-s {size}] [-q {quality}]
(Optional) Python Version Command Template (When Node is Unavailable):
bash
python scripts/generate_image.py -p "{prompt}" -f "{filename}" [-s {size}] [-q {quality}] [-o {output_format}]
python scripts/generate_image.py -p "{edit_instruction}" -i "{input_path}" -f "{output_filename}" [-s {size}] [-q {quality}]
⏱️ Long-running Task Processing Strategy
1. Pre-task Prompt
Must inform the user before execution:
- "Image generation has started, it is expected to take 120-150 seconds, please wait patiently"
2. 🎨 Best Practice Example
"Image generation in progress, expected to complete in 120-150 seconds...\n⏳ Generating...\n(Complex scenes with high quality + 2K/4K may take longer, please wait patiently)"
Step 3: Result Feedback
- Execution Feedback: Wait for the terminal command to complete execution.
- Success: Inform the user that the image has been generated and indicate the save path.
- Failure:
- If prompted for missing API Key, guide the user to set the environment variable.
- If prompted for network error, suggest the user check the network or try again later.
Command Line Usage Examples
Generate New Images
bash
python scripts/generate_image.py -p "Image description text" -f "output.png" [-s {size}] [-q {quality}] [-o {output_format}]
Example:
bash
# Basic generation
python scripts/generate_image.py -p "A cute orange cat playing on the grass" -f "cat.png"
# Specify size and quality
python scripts/generate_image.py -p "Sunset mountain landscape" -f "sunset.png" -s "2048x1152" -q "high"
# Vertical HD image (suitable for mobile wallpaper)
python scripts/generate_image.py -p "City night view" -f "city.png" -s "2160x3840" -q "high"
# Output as JPEG
python scripts/generate_image.py -p "Landscape photo" -f "landscape.jpg" -s "3840x2160" -q "high" -o "jpeg"
(Optional) Node.js Version Example:
bash
# Basic generation
node scripts/generate_image.js -p "A cute orange cat playing on the grass" -f "cat.png"
# Specify size and quality
node scripts/generate_image.js -p "Sunset mountain landscape" -f "sunset.png" -s "2048x1152" -q "high"
Edit Existing Images
bash
python scripts/generate_image.py -p "Editing instruction" -f "output.png" -i "path/to/input.png" [-s {size}] [-q {quality}]
Example:
bash
# Modify style
python scripts/generate_image.py -p "Convert the image to watercolor style" -f "watercolor.png" -i "original.png"
# Add elements
python scripts/generate_image.py -p "Add a rainbow to the sky" -f "rainbow.png" -i "landscape.png" -q "high"
# Replace background
python scripts/generate_image.py -p "Change the background to a beach" -f "beach-bg.png" -i "portrait.png" -s "2048x2048"
(Optional) Node.js Version Example:
bash
# Modify style
node scripts/generate_image.js -p "Convert the image to watercolor style" -f "watercolor.png" -i "original.png"
# Multi-reference image fusion (up to 5 images)
node scripts/generate_image.js -p "Put the character from Image 1 into the scene of Image 2" -i ref1.png ref2.png -f "merged.png"
Command Line Parameter Description
Parameters are consistent between Python and Node.js versions (short parameters are equivalent to long parameters).
| Parameter | Required | Description |
|---|
| / | Yes | Image description (text-to-image) or editing instruction (image-to-image). Retain the user's original complete input. |
| / | No | Output image path/filename; if not provided, a filename with timestamp will be generated automatically. |
| / | No | Output size: 1024x1024 / 1536x1024 / 1024x1536 / 2048x2048 / 2048x1152 / 3840x2160 / 2160x3840 or custom size. |
| / | No | Quality level: low / medium / high / auto (default auto). |
| / | No | Output format: png (default)/ jpeg / webp. |
| / | No | Output compression rate (0-100), only valid for jpeg/webp. |
| / | No | Input image path for image-to-image; multiple images can be passed (up to 5). Passing this parameter enters edit mode. |
File Resource Description
| Resource | Description |
|---|
scripts/generate_image.js
| Node.js version (zero dependencies, prefer to use) |
scripts/generate_image.py
| Python version (alternative) |
| Size and aspect ratio control document, use when needed, load on demand |
references/batch-template.md
| Batch generation configuration template, use when batch generation is needed, load on demand |
Batch Generation
When users need to generate multiple images at once (batch generation):
- Load Configuration Template: references/batch-template.md — includes JSON configuration format description and usage examples
- Obtain/Generate JSON File: Users can provide their own JSON file, or describe requirements and let AI generate it based on the requirements
- Execute One by One: Read the prompts array, execute the generation command one by one, and feedback the result after each image is completed
- Summary Feedback: After completion, inform the user of the number of successful images and the list of image paths
Note: Total time for batch tasks = single image time (120-150 seconds) × number of images, please inform the user of the estimated duration in advance.
Quality Description
| Quality | Description | Applicable Scenario |
|---|
| low | Sketch/batch generation | Quick preview, multiple iterations |
| medium | Daily use | General usage |
| high | Final draft/fine text | Final output, images containing text |
| auto | Default | Determined by the model |
Output Format Description
| Format | Description | Applicable Scenario |
|---|
| png | Lossless compression, transparent background | Need transparent background, retain best quality |
| jpeg | Compressed | Photos, storage space sensitive |
| webp | Modern format | Web usage, balance quality and size |
Note: The b64_json field is pure base64, without the
prefix. Clients need to:
- Write to file:
base64.b64decode(b64_str)
→ write to disk
- Render in browser: Append the prefix + b64 on your own
Notes
- API Key must be set, can be provided via environment variable or command line parameter
- Image generation time: Approximately 120-150 seconds, complex scenes with high quality + 2K/4K may take longer
- When editing images, use multipart/form-data to upload reference images
- Ensure the output directory has write permission
- Billed by token (not per image)
API Key Setup and Acquisition
How to Obtain API Key
If you don't have an API Key yet, please go to
https://api.apiyi.com to register an account and apply for an API Key.
Acquisition steps:
- Visit https://api.apiyi.com
- Register/login your account
- Create an API Key in the console
- Copy the key and set the environment variable or use it in the command line
Set API Key
The script obtains the API Key from the environment variable
.
Set Environment Variable:
bash
# Linux/Mac
export APIYI_API_KEY="your-api-key-here"
Set environment variable in advanced settings of your computer or execute the set command:
# Windows CMD
set APIYI_API_KEY=your-api-key-here
# Windows PowerShell
$env:APIYI_API_KEY="your-api-key-here"
API Endpoint Description
Text-to-Image Endpoint: POST /v1/images/generations
Text-to-image endpoint, uses JSON format for requests.
Image-to-Image Endpoint: POST /v1/images/edits
Image-to-image endpoint, uses multipart/form-data format for requests. Upload reference images (up to 5) + instructions for single image editing and multi-image fusion.
The order of reference images is meaningful, and "Image 1/Image 2/Image 3" can be used in the prompt to refer to them.
Model Information
- Model Name: gpt-image-2
- Image Generation Speed: Approximately 120-150 seconds (4K complex scenes may take longer)
- Output Resolution: 1024x1024 / 1536x1024 / 1024x1536 / 2048x2048 / 2048x1152 / 3840x2160 / 2160x3840 or custom
- Default Response Format: b64_json (pure base64, no prefix)
- Quality Levels: low / medium / high / auto
- Output Formats: png / jpeg / webp
- Supported Capabilities: Text-to-image, single image editing, multi-image fusion
- Billing Method: Billed by token
Comparison between gpt-image-2 (Official Formal Version) vs gpt-image-2-all (Official Reverse Version)
| Feature | gpt-image-2 | gpt-image-2-all |
|---|
| Nature | Official formal version | Official reverse version |
| Billing | Billed by token | Fixed $0.03 per image |
| Endpoints | /v1/images/generations, /v1/images/edits | /v1/chat/completions |
| Reference Image Upload | multipart form-data | base64 data URL |
| Image Download | b64_json (pure base64) | url or b64_json (with prefix) |
| Multi-Image Fusion | image[] array (up to 5 images) | chat multiple image_url |
| Size Control | Explicit size parameter | Prompt description |
| Speed | Approximately 120-150 seconds | Approximately 60-300 seconds |
Author Introduction
- LoveOnePiece_Ubiquitous
- My WeChat Official Account: Ubiquitous Technology