nano-banana-image-combine

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Nano Banana Image Combination

Nano Banana 图片合成工具

Purpose

用途

Combine, merge, and compose multiple images using Google's Gemini 2.5 Flash (codename "Nano Banana") via OpenRouter. Perfect for creating composite images, replacing backgrounds, face swapping, and AI-guided photo manipulation.
通过OpenRouter使用谷歌的Gemini 2.5 Flash(代号"Nano Banana")来合并、融合并合成多张图片。非常适合创建合成图、替换背景、人脸互换以及AI引导的图片处理。

When to Use

适用场景

  • Combining 2+ images into single composition
  • Face swapping or identity replacement
  • Background replacement
  • Creating thumbnails from multiple sources
  • AI-guided photo collages
  • Portrait + background composition
  • 将2张及以上图片合并为单一合成图
  • 人脸互换或身份替换
  • 背景替换
  • 从多源素材创建缩略图
  • AI引导的照片拼贴
  • 人像+背景合成

Architecture Pattern

架构模式

Project Structure

项目结构

backend/
├── services/
│   ├── image_combiner_service.py    # Main combination logic
│   └── openrouter_service.py        # OpenRouter client
├── models/
│   └── combine_models.py            # Pydantic models
├── utils/
│   ├── image_encoding.py            # Base64 encoding
│   └── image_download.py            # Fetch from URLs
└── config/
    └── openrouter_config.py         # Configuration
backend/
├── services/
│   ├── image_combiner_service.py    # 核心合成逻辑
│   └── openrouter_service.py        # OpenRouter客户端
├── models/
│   └── combine_models.py            # Pydantic模型
├── utils/
│   ├── image_encoding.py            # Base64编码
│   └── image_download.py            # 从URL获取图片
└── config/
    └── openrouter_config.py         # 配置文件

Installation

安装步骤

bash
pip install httpx python-dotenv pydantic pillow base64
bash
pip install httpx python-dotenv pydantic pillow base64

Environment Setup

环境配置

bash
undefined
bash
undefined

.env

.env

OPENROUTER_API_KEY=sk-or-v1-... FRONTEND_URL=http://localhost:3000 NANO_BANANA_MODEL=google/gemini-2.5-flash-image-preview
undefined
OPENROUTER_API_KEY=sk-or-v1-... FRONTEND_URL=http://localhost:3000 NANO_BANANA_MODEL=google/gemini-2.5-flash-image-preview
undefined

Quick Start

快速开始

Basic Image Combination

基础图片合成

python
import httpx
import base64
from typing import List

async def combine_images(
    image_urls: List[str],
    prompt: str
) -> str:
    """Combine multiple images using Nano Banana"""

    # Encode images to base64
    encoded_images = []
    async with httpx.AsyncClient() as client:
        for url in image_urls:
            response = await client.get(url)
            b64 = base64.b64encode(response.content).decode('utf-8')
            encoded_images.append({
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{b64}"
                }
            })

    # Call OpenRouter
    response = await client.post(
        "https://openrouter.ai/api/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {os.getenv('OPENROUTER_API_KEY')}",
            "HTTP-Referer": os.getenv('FRONTEND_URL')
        },
        json={
            "model": "google/gemini-2.5-flash-image-preview",
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        *encoded_images
                    ]
                }
            ]
        }
    )

    result = response.json()
    return result["choices"][0]["message"]["content"]
python
import httpx
import base64
from typing import List

async def combine_images(
    image_urls: List[str],
    prompt: str
) -> str:
    """使用Nano Banana合成多张图片"""

    # 将图片编码为base64
    encoded_images = []
    async with httpx.AsyncClient() as client:
        for url in image_urls:
            response = await client.get(url)
            b64 = base64.b64encode(response.content).decode('utf-8')
            encoded_images.append({
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{b64}"
                }
            })

    # 调用OpenRouter
    response = await client.post(
        "https://openrouter.ai/api/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {os.getenv('OPENROUTER_API_KEY')}",
            "HTTP-Referer": os.getenv('FRONTEND_URL')
        },
        json={
            "model": "google/gemini-2.5-flash-image-preview",
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        *encoded_images
                    ]
                }
            ]
        }
    )

    result = response.json()
    return result["choices"][0]["message"]["content"]

Complete Implementation

完整实现

Image Encoding Utility

图片编码工具类

python
import httpx
import base64
from PIL import Image
from io import BytesIO
from typing import Tuple

class ImageEncoder:
    @staticmethod
    async def download_image(url: str) -> bytes:
        """Download image from URL"""
        async with httpx.AsyncClient(timeout=30.0) as client:
            response = await client.get(url)
            response.raise_for_status()
            return response.content

    @staticmethod
    def resize_image(image_bytes: bytes, max_size: Tuple[int, int] = (1024, 1024)) -> bytes:
        """Resize image to reduce API costs"""
        img = Image.open(BytesIO(image_bytes))

        # Calculate new size maintaining aspect ratio
        img.thumbnail(max_size, Image.Resampling.LANCZOS)

        # Convert to RGB if RGBA
        if img.mode == 'RGBA':
            img = img.convert('RGB')

        # Save to bytes
        buffer = BytesIO()
        img.save(buffer, format='JPEG', quality=85)
        return buffer.getvalue()

    @staticmethod
    def encode_base64(image_bytes: bytes) -> str:
        """Encode image to base64"""
        return base64.b64encode(image_bytes).decode('utf-8')

    @classmethod
    async def prepare_image(cls, url: str, resize: bool = True) -> str:
        """Download, optionally resize, and encode image"""
        image_bytes = await cls.download_image(url)

        if resize:
            image_bytes = cls.resize_image(image_bytes)

        return cls.encode_base64(image_bytes)
python
import httpx
import base64
from PIL import Image
from io import BytesIO
from typing import Tuple

class ImageEncoder:
    @staticmethod
    async def download_image(url: str) -> bytes:
        """从URL下载图片"""
        async with httpx.AsyncClient(timeout=30.0) as client:
            response = await client.get(url)
            response.raise_for_status()
            return response.content

    @staticmethod
    def resize_image(image_bytes: bytes, max_size: Tuple[int, int] = (1024, 1024)) -> bytes:
        """调整图片大小以降低API成本"""
        img = Image.open(BytesIO(image_bytes))

        # 按比例计算新尺寸
        img.thumbnail(max_size, Image.Resampling.LANCZOS)

        # 如果是RGBA格式则转换为RGB
        if img.mode == 'RGBA':
            img = img.convert('RGB')

        # 保存为字节流
        buffer = BytesIO()
        img.save(buffer, format='JPEG', quality=85)
        return buffer.getvalue()

    @staticmethod
    def encode_base64(image_bytes: bytes) -> str:
        """将图片编码为base64格式"""
        return base64.b64encode(image_bytes).decode('utf-8')

    @classmethod
    async def prepare_image(cls, url: str, resize: bool = True) -> str:
        """下载图片(可选调整大小)并编码"""
        image_bytes = await cls.download_image(url)

        if resize:
            image_bytes = cls.resize_image(image_bytes)

        return cls.encode_base64(image_bytes)

Pydantic Models

Pydantic模型

python
from pydantic import BaseModel, Field, HttpUrl
from typing import List, Literal

class CombineImagesInput(BaseModel):
    image_urls: List[HttpUrl] = Field(
        min_length=2,
        max_length=8,
        description="URLs of images to combine (2-8 images)"
    )
    prompt: str = Field(
        description="Instructions for how to combine the images",
        examples=[
            "Combine these images into a professional YouTube thumbnail",
            "Replace the background of the person in image 1 with image 2",
            "Create a face swap using the face from image 1 on the body in image 2"
        ]
    )
    style: Literal["natural", "artistic", "professional", "creative"] = "natural"
    output_format: Literal["url", "base64"] = "url"
    resize_inputs: bool = Field(
        default=True,
        description="Resize inputs to 1024x1024 to save costs"
    )

class CombineImagesOutput(BaseModel):
    success: bool
    result_url: str | None = None
    result_base64: str | None = None
    prompt_used: str
    images_processed: int
    error: str | None = None
python
from pydantic import BaseModel, Field, HttpUrl
from typing import List, Literal

class CombineImagesInput(BaseModel):
    image_urls: List[HttpUrl] = Field(
        min_length=2,
        max_length=8,
        description="要合成的图片URL(2-8张)"
    )
    prompt: str = Field(
        description="图片合成的指令说明",
        examples=[
            "将这些图片合并为专业的YouTube缩略图",
            "把图片1中人物的背景替换为图片2",
            "用图片1的脸替换图片2中的人脸"
        ]
    )
    style: Literal["natural", "artistic", "professional", "creative"] = "natural"
    output_format: Literal["url", "base64"] = "url"
    resize_inputs: bool = Field(
        default=True,
        description="将输入图片调整为1024x1024以节省成本"
    )

class CombineImagesOutput(BaseModel):
    success: bool
    result_url: str | None = None
    result_base64: str | None = None
    prompt_used: str
    images_processed: int
    error: str | None = None

OpenRouter Service

OpenRouter服务类

python
import httpx
import os
from typing import List, Dict, Any

class OpenRouterService:
    def __init__(self):
        self.api_key = os.getenv("OPENROUTER_API_KEY")
        self.base_url = "https://openrouter.ai/api/v1"
        self.frontend_url = os.getenv("FRONTEND_URL", "http://localhost:3000")

    async def chat_with_images(
        self,
        prompt: str,
        images: List[str],  # Base64 encoded
        model: str = "google/gemini-2.5-flash-image-preview",
        max_tokens: int = 4096
    ) -> Dict[str, Any]:
        """Send chat request with multiple images"""

        # Build content array
        content = [{"type": "text", "text": prompt}]

        # Add images
        for img_b64 in images:
            content.append({
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{img_b64}"
                }
            })

        # Make request
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.post(
                f"{self.base_url}/chat/completions",
                headers={
                    "Authorization": f"Bearer {self.api_key}",
                    "HTTP-Referer": self.frontend_url,
                    "Content-Type": "application/json"
                },
                json={
                    "model": model,
                    "messages": [
                        {
                            "role": "user",
                            "content": content
                        }
                    ],
                    "max_tokens": max_tokens,
                    "temperature": 0.7
                }
            )

            response.raise_for_status()
            return response.json()
python
import httpx
import os
from typing import List, Dict, Any

class OpenRouterService:
    def __init__(self):
        self.api_key = os.getenv("OPENROUTER_API_KEY")
        self.base_url = "https://openrouter.ai/api/v1"
        self.frontend_url = os.getenv("FRONTEND_URL", "http://localhost:3000")

    async def chat_with_images(
        self,
        prompt: str,
        images: List[str],  # Base64编码后的图片
        model: str = "google/gemini-2.5-flash-image-preview",
        max_tokens: int = 4096
    ) -> Dict[str, Any]:
        """发送包含多张图片的聊天请求"""

        # 构建内容数组
        content = [{"type": "text", "text": prompt}]

        # 添加图片
        for img_b64 in images:
            content.append({
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{img_b64}"
                }
            })

        # 发送请求
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.post(
                f"{self.base_url}/chat/completions",
                headers={
                    "Authorization": f"Bearer {self.api_key}",
                    "HTTP-Referer": self.frontend_url,
                    "Content-Type": "application/json"
                },
                json={
                    "model": model,
                    "messages": [
                        {
                            "role": "user",
                            "content": content
                        }
                    ],
                    "max_tokens": max_tokens,
                    "temperature": 0.7
                }
            )

            response.raise_for_status()
            return response.json()

Complete Combination Service

完整合成服务类

python
import asyncio
from typing import List
import os

class ImageCombinationService:
    def __init__(self):
        self.openrouter = OpenRouterService()
        self.encoder = ImageEncoder()

    async def combine_images(
        self,
        input: CombineImagesInput
    ) -> CombineImagesOutput:
        """Main method to combine images"""

        try:
            # Step 1: Download and encode images
            encoded_images = await self._prepare_images(
                input.image_urls,
                resize=input.resize_inputs
            )

            # Step 2: Build prompt
            prompt = self._build_combination_prompt(
                input.prompt,
                input.style,
                len(input.image_urls)
            )

            # Step 3: Call OpenRouter
            response = await self.openrouter.chat_with_images(
                prompt=prompt,
                images=encoded_images
            )

            # Step 4: Extract result
            result = self._extract_result(response, input.output_format)

            return CombineImagesOutput(
                success=True,
                result_url=result if input.output_format == "url" else None,
                result_base64=result if input.output_format == "base64" else None,
                prompt_used=prompt,
                images_processed=len(input.image_urls)
            )

        except Exception as e:
            return CombineImagesOutput(
                success=False,
                prompt_used=input.prompt,
                images_processed=0,
                error=str(e)
            )

    async def _prepare_images(
        self,
        urls: List[str],
        resize: bool
    ) -> List[str]:
        """Download and encode all images concurrently"""
        tasks = [
            self.encoder.prepare_image(str(url), resize=resize)
            for url in urls
        ]
        return await asyncio.gather(*tasks)

    def _build_combination_prompt(
        self,
        user_prompt: str,
        style: str,
        num_images: int
    ) -> str:
        """Build enhanced prompt for better results"""

        style_instructions = {
            "natural": "Create a natural, realistic combination that looks like a single photo.",
            "artistic": "Combine with artistic flair, creative composition, and visual interest.",
            "professional": "Create a clean, professional composition suitable for business use.",
            "creative": "Be bold and creative with the combination, prioritize visual impact."
        }

        return f"""You are an expert image compositor. You have {num_images} images to work with.

USER REQUEST: {user_prompt}

STYLE GUIDELINE: {style_instructions[style]}

REQUIREMENTS:
- Seamlessly blend the images
- Maintain consistent lighting and color balance
- Ensure natural transitions between elements
- Preserve important details from all source images
- Output high-quality composition

Generate the combined image now."""

    def _extract_result(self, response: Dict[str, Any], format: str) -> str:
        """Extract URL or base64 from response"""
        content = response["choices"][0]["message"]["content"]

        # Nano Banana returns image URL in content
        if format == "url":
            # Extract URL from markdown or plain text
            import re
            url_match = re.search(r'https?://[^\s]+', content)
            if url_match:
                return url_match.group(0)
            return content

        return content
python
import asyncio
from typing import List
import os

class ImageCombinationService:
    def __init__(self):
        self.openrouter = OpenRouterService()
        self.encoder = ImageEncoder()

    async def combine_images(
        self,
        input: CombineImagesInput
    ) -> CombineImagesOutput:
        """图片合成的核心方法"""

        try:
            # 步骤1:下载并编码图片
            encoded_images = await self._prepare_images(
                input.image_urls,
                resize=input.resize_inputs
            )

            # 步骤2:构建提示词
            prompt = self._build_combination_prompt(
                input.prompt,
                input.style,
                len(input.image_urls)
            )

            # 步骤3:调用OpenRouter
            response = await self.openrouter.chat_with_images(
                prompt=prompt,
                images=encoded_images
            )

            # 步骤4:提取结果
            result = self._extract_result(response, input.output_format)

            return CombineImagesOutput(
                success=True,
                result_url=result if input.output_format == "url" else None,
                result_base64=result if input.output_format == "base64" else None,
                prompt_used=prompt,
                images_processed=len(input.image_urls)
            )

        except Exception as e:
            return CombineImagesOutput(
                success=False,
                prompt_used=input.prompt,
                images_processed=0,
                error=str(e)
            )

    async def _prepare_images(
        self,
        urls: List[str],
        resize: bool
    ) -> List[str]:
        """并发下载并编码所有图片"""
        tasks = [
            self.encoder.prepare_image(str(url), resize=resize)
            for url in urls
        ]
        return await asyncio.gather(*tasks)

    def _build_combination_prompt(
        self,
        user_prompt: str,
        style: str,
        num_images: int
    ) -> str:
        """构建优化后的提示词以获得更好的结果"""

        style_instructions = {
            "natural": "创建自然、逼真的合成效果,看起来像一张真实的照片。",
            "artistic": "以艺术风格合成,注重创意构图和视觉吸引力。",
            "professional": "创建简洁、专业的合成效果,适合商业场景使用。",
            "creative": "大胆创新地合成图片,优先考虑视觉冲击力。"
        }

        return f"""你是专业的图片合成专家,现在有{num_images}张图片可供使用。

用户需求:{user_prompt}

风格指南:{style_instructions[style]}

要求:
- 无缝融合所有图片
- 保持一致的光线和色彩平衡
- 确保元素之间的过渡自然
- 保留所有源图片中的重要细节
- 生成高质量的合成图

立即生成合成图片。"""

    def _extract_result(self, response: Dict[str, Any], format: str) -> str:
        """从响应中提取URL或base64格式的结果"""
        content = response["choices"][0]["message"]["content"]

        # Nano Banana会在响应内容中返回图片URL
        if format == "url":
            # 从markdown或纯文本中提取URL
            import re
            url_match = re.search(r'https?://[^\s]+', content)
            if url_match:
                return url_match.group(0)
            return content

        return content

Advanced Use Cases

高级使用场景

Face Swap

人脸互换

python
async def face_swap(
    face_image_url: str,
    body_image_url: str
) -> str:
    """Swap face from one image onto body in another"""

    input = CombineImagesInput(
        image_urls=[face_image_url, body_image_url],
        prompt="""Take the face from image 1 and naturally place it on the person in image 2.
        Ensure:
        - Face matches body's angle and lighting
        - Natural skin tone blending
        - Consistent shadows and highlights
        - No visible seams""",
        style="natural"
    )

    service = ImageCombinationService()
    result = await service.combine_images(input)
    return result.result_url
python
async def face_swap(
    face_image_url: str,
    body_image_url: str
) -> str:
    """将一张图片中的人脸替换到另一张图片的人物身体上"""

    input = CombineImagesInput(
        image_urls=[face_image_url, body_image_url],
        prompt="""将图片1中的人脸自然地替换到图片2中人物的身体上。
        要求:
        - 人脸与身体的角度和光线匹配
        - 肤色自然融合
        - 阴影和高光保持一致
        - 无明显拼接痕迹""",
        style="natural"
    )

    service = ImageCombinationService()
    result = await service.combine_images(input)
    return result.result_url

Background Replacement

背景替换

python
async def replace_background(
    subject_url: str,
    background_url: str,
    depth_of_field: bool = True
) -> str:
    """Replace background while preserving subject"""

    dof_instruction = "Apply subtle depth of field blur to background" if depth_of_field else ""

    input = CombineImagesInput(
        image_urls=[subject_url, background_url],
        prompt=f"""Extract the main subject from image 1 and place it naturally on the background from image 2.

        Requirements:
        - Clean subject extraction with natural edges
        - Match lighting conditions between subject and background
        - Natural shadows under subject
        {dof_instruction}
        - Professional composition""",
        style="professional"
    )

    service = ImageCombinationService()
    result = await service.combine_images(input)
    return result.result_url
python
async def replace_background(
    subject_url: str,
    background_url: str,
    depth_of_field: bool = True
) -> str:
    """替换图片背景同时保留主体"""

    dof_instruction = "为背景添加轻微的景深模糊效果" if depth_of_field else ""

    input = CombineImagesInput(
        image_urls=[subject_url, background_url],
        prompt=f"""提取图片1中的主体,并将其自然地放置到图片2的背景中。

        要求:
        - 主体提取干净,边缘自然
        - 主体与背景的光线条件匹配
        - 主体下方添加自然的阴影
        {dof_instruction}
        - 专业的构图""",
        style="professional"
    )

    service = ImageCombinationService()
    result = await service.combine_images(input)
    return result.result_url

Multi-Image Collage

多图拼贴

python
async def create_collage(
    image_urls: List[str],
    layout: Literal["grid", "creative", "storytelling"] = "grid",
    title: str | None = None
) -> str:
    """Create artistic collage from multiple images"""

    layout_prompts = {
        "grid": "Arrange images in a clean grid layout with equal spacing",
        "creative": "Create an artistic, overlapping composition with varied sizes",
        "storytelling": "Arrange images to tell a visual story, left to right"
    }

    title_text = f"Include the text '{title}' as a prominent title" if title else ""

    input = CombineImagesInput(
        image_urls=image_urls,
        prompt=f"""{layout_prompts[layout]}. {title_text}

        Create a cohesive collage that:
        - Maintains visual balance
        - Uses consistent color grading
        - Has professional spacing and alignment
        - Feels unified despite multiple sources""",
        style="artistic"
    )

    service = ImageCombinationService()
    result = await service.combine_images(input)
    return result.result_url
python
async def create_collage(
    image_urls: List[str],
    layout: Literal["grid", "creative", "storytelling"] = "grid",
    title: str | None = None
) -> str:
    """使用多张图片创建艺术拼贴画"""

    layout_prompts = {
        "grid": "将图片排列为整洁的网格布局,间距均匀",
        "creative": "创建艺术化的重叠布局,图片大小各异",
        "storytelling": "按从左到右的顺序排列图片,讲述视觉故事"
    }

    title_text = f"添加文字'{title}'作为醒目的标题" if title else ""

    input = CombineImagesInput(
        image_urls=image_urls,
        prompt=f"""{layout_prompts[layout]}{title_text}

        创建一个连贯的拼贴画,要求:
        - 保持视觉平衡
        - 使用统一的色彩分级
        - 专业的间距和对齐方式
        - 尽管来源多样,但整体感觉统一""",
        style="artistic"
    )

    service = ImageCombinationService()
    result = await service.combine_images(input)
    return result.result_url

YouTube Thumbnail Creator

YouTube缩略图生成

python
async def create_youtube_thumbnail(
    portrait_url: str,
    background_url: str,
    title_text: str,
    style: Literal["tech", "gaming", "vlog", "tutorial"] = "tech"
) -> str:
    """Create engaging YouTube thumbnail"""

    style_guides = {
        "tech": "Clean, modern, professional tech aesthetic with blue/purple tones",
        "gaming": "High energy, vibrant colors, action-oriented composition",
        "vlog": "Personal, inviting, warm tones, casual composition",
        "tutorial": "Clear, educational, step-by-step visual hierarchy"
    }

    input = CombineImagesInput(
        image_urls=[portrait_url, background_url],
        prompt=f"""Create a professional YouTube thumbnail combining these images.

        STYLE: {style_guides[style]}

        TEXT TO INCLUDE: "{title_text}"

        REQUIREMENTS:
        - 1280x720 resolution (16:9 aspect ratio)
        - Bold, readable text overlay
        - High contrast for thumbnail visibility
        - Portrait positioned prominently
        - Background provides context without distraction
        - Eye-catching composition that stops scrolling
        - Professional color grading""",
        style="professional",
        resize_inputs=False  # Keep original quality
    )

    service = ImageCombinationService()
    result = await service.combine_images(input)
    return result.result_url
python
async def create_youtube_thumbnail(
    portrait_url: str,
    background_url: str,
    title_text: str,
    style: Literal["tech", "gaming", "vlog", "tutorial"] = "tech"
) -> str:
    """制作吸引人的YouTube缩略图"""

    style_guides = {
        "tech": "简洁、现代、专业的科技美学,使用蓝紫色调",
        "gaming": "高能量、鲜艳色彩、动作导向的构图",
        "vlog": "个人化、亲切、暖色调、休闲的构图",
        "tutorial": "清晰、教育性、分步式的视觉层次"
    }

    input = CombineImagesInput(
        image_urls=[portrait_url, background_url],
        prompt=f"""将这些图片合并为专业的YouTube缩略图。

        风格:{style_guides[style]}

        需要包含的文字:"{title_text}"

        要求:
        - 1280x720分辨率(16:9宽高比)
        - 醒目、易读的文字叠加
        - 高对比度以保证缩略图可见性
        - 人像放置在显眼位置
        - 背景提供上下文但不分散注意力
        - 引人注目的构图,吸引用户停止滚动
        - 专业的色彩分级""",
        style="professional",
        resize_inputs=False  # 保留原始画质
    )

    service = ImageCombinationService()
    result = await service.combine_images(input)
    return result.result_url

FastAPI Integration

FastAPI集成

Complete API Endpoint

完整API端点

python
from fastapi import FastAPI, HTTPException, BackgroundTasks
from typing import List

app = FastAPI()
python
from fastapi import FastAPI, HTTPException, BackgroundTasks
from typing import List

app = FastAPI()

Global service instance

全局服务实例

combiner_service = ImageCombinationService()
@app.post("/api/combine-images", response_model=CombineImagesOutput) async def combine_images_endpoint(request: CombineImagesInput): """Combine multiple images using Nano Banana""" try: result = await combiner_service.combine_images(request)
    if not result.success:
        raise HTTPException(status_code=500, detail=result.error)

    return result
except Exception as e:
    raise HTTPException(status_code=500, detail=str(e))
@app.post("/api/face-swap") async def face_swap_endpoint( face_image_url: HttpUrl, body_image_url: HttpUrl ): """Face swap shortcut endpoint""" try: result_url = await face_swap(str(face_image_url), str(body_image_url)) return {"result_url": result_url} except Exception as e: raise HTTPException(status_code=500, detail=str(e))
@app.post("/api/replace-background") async def replace_background_endpoint( subject_url: HttpUrl, background_url: HttpUrl, depth_of_field: bool = True ): """Background replacement endpoint""" try: result_url = await replace_background( str(subject_url), str(background_url), depth_of_field ) return {"result_url": result_url} except Exception as e: raise HTTPException(status_code=500, detail=str(e))
undefined
combiner_service = ImageCombinationService()
@app.post("/api/combine-images", response_model=CombineImagesOutput) async def combine_images_endpoint(request: CombineImagesInput): """使用Nano Banana合成多张图片""" try: result = await combiner_service.combine_images(request)
    if not result.success:
        raise HTTPException(status_code=500, detail=result.error)

    return result
except Exception as e:
    raise HTTPException(status_code=500, detail=str(e))
@app.post("/api/face-swap") async def face_swap_endpoint( face_image_url: HttpUrl, body_image_url: HttpUrl ): """人脸互换快捷端点""" try: result_url = await face_swap(str(face_image_url), str(body_image_url)) return {"result_url": result_url} except Exception as e: raise HTTPException(status_code=500, detail=str(e))
@app.post("/api/replace-background") async def replace_background_endpoint( subject_url: HttpUrl, background_url: HttpUrl, depth_of_field: bool = True ): """背景替换端点""" try: result_url = await replace_background( str(subject_url), str(background_url), depth_of_field ) return {"result_url": result_url} except Exception as e: raise HTTPException(status_code=500, detail=str(e))
undefined

Tool Calling Integration

工具调用集成

Agent Tool Definition

AI Agent工具定义

python
from pydantic_ai import Agent, Tool
python
from pydantic_ai import Agent, Tool

Define tool for AI agent

为AI Agent定义工具

combine_images_tool = Tool( name="combine_images", description="""Combine 2-8 images into a single composition using AI.
Use cases:
- Face swapping
- Background replacement
- Creating thumbnails
- Photo collages
- Portrait + scene composition

Provide image URLs and clear instructions for combination.""",
parameters=CombineImagesInput,
execute=lambda args: ImageCombinationService().combine_images(args)
)
combine_images_tool = Tool( name="combine_images", description="""使用AI将2-8张图片合并为单一合成图。
适用场景:
- 人脸互换
- 背景替换
- 生成缩略图
- 照片拼贴
- 人像+场景合成

请提供图片URL和清晰的合成指令。""",
parameters=CombineImagesInput,
execute=lambda args: ImageCombinationService().combine_images(args)
)

Register with agent

注册到Agent

agent = Agent( model='openrouter:openai/gpt-4o', tools=[combine_images_tool], system_prompt="""You are an AI assistant with image combination capabilities.
When users select multiple images and ask to combine them, use the combine_images tool.

Examples of combination requests:
- "Combine these two"
- "Put my face on that background"
- "Create a thumbnail from these images"
- "Swap faces between these photos"
"""
)
undefined
agent = Agent( model='openrouter:openai/gpt-4o', tools=[combine_images_tool], system_prompt="""你是具备图片合成能力的AI助手。
当用户选择多张图片并要求合并时,使用combine_images工具。

合成请求示例:
- "把这两张图合并"
- "把我的脸放到那个背景上"
- "用这些图片做一个缩略图"
- "把这两张照片的脸互换"
"""
)
undefined

Conversational Integration

对话式集成

python
from pydantic import BaseModel

class ChatRequest(BaseModel):
    message: str
    selected_images: List[str] = []  # URLs of selected images

@app.post("/api/chat")
async def chat_with_image_context(request: ChatRequest):
    """Chat endpoint with image selection context"""

    # Build system prompt with image context
    system_prompt = "You are a helpful assistant."

    if request.selected_images:
        system_prompt += f"""

        The user has selected {len(request.selected_images)} images:
        {', '.join(request.selected_images)}

        If they ask to combine/merge/blend images, use the combine_images tool."""

    # Agent processes message
    result = await agent.run(
        request.message,
        context={"selected_images": request.selected_images}
    )

    return {"response": result}
python
from pydantic import BaseModel

class ChatRequest(BaseModel):
    message: str
    selected_images: List[str] = []  # 用户选择的图片URL

@app.post("/api/chat")
async def chat_with_image_context(request: ChatRequest):
    """带图片选择上下文的聊天端点"""

    # 构建包含图片上下文的系统提示词
    system_prompt = "你是一个乐于助人的助手。"

    if request.selected_images:
        system_prompt += f"""

        用户已选择{len(request.selected_images)}张图片:
        {', '.join(request.selected_images)}

        如果用户要求合并/融合图片,请使用combine_images工具。"""

    # Agent处理消息
    result = await agent.run(
        request.message,
        context={"selected_images": request.selected_images}
    )

    return {"response": result}

Error Handling

错误处理

Comprehensive Error Handling

全面的错误处理

python
from enum import Enum

class CombineError(Exception):
    """Base combination error"""
    pass

class InvalidImageError(CombineError):
    """Invalid or inaccessible image URL"""
    pass

class APIError(CombineError):
    """OpenRouter API error"""
    pass

async def safe_combine(
    input: CombineImagesInput,
    retry_count: int = 3
) -> CombineImagesOutput:
    """Combine with retry logic"""

    for attempt in range(retry_count):
        try:
            service = ImageCombinationService()
            result = await service.combine_images(input)

            if result.success:
                return result

            # If failed, retry
            if attempt < retry_count - 1:
                await asyncio.sleep(2 ** attempt)
                continue

            return result

        except httpx.HTTPError as e:
            if attempt < retry_count - 1:
                await asyncio.sleep(2 ** attempt)
                continue
            raise APIError(f"OpenRouter API error: {e}")
        except Exception as e:
            raise CombineError(f"Combination failed: {e}")
python
from enum import Enum

class CombineError(Exception):
    """图片合成基础异常"""
    pass

class InvalidImageError(CombineError):
    """图片URL无效或无法访问"""
    pass

class APIError(CombineError):
    """OpenRouter API调用异常"""
    pass

async def safe_combine(
    input: CombineImagesInput,
    retry_count: int = 3
) -> CombineImagesOutput:
    """带重试逻辑的安全合成方法"""

    for attempt in range(retry_count):
        try:
            service = ImageCombinationService()
            result = await service.combine_images(input)

            if result.success:
                return result

            # 如果失败,进行重试
            if attempt < retry_count - 1:
                await asyncio.sleep(2 ** attempt)
                continue

            return result

        except httpx.HTTPError as e:
            if attempt < retry_count - 1:
                await asyncio.sleep(2 ** attempt)
                continue
            raise APIError(f"OpenRouter API调用错误:{e}")
        except Exception as e:
            raise CombineError(f"合成失败:{e}")

Best Practices

最佳实践

  1. Resize images before sending - Reduces API costs and latency
  2. Validate URLs before downloading - Avoid 404 errors
  3. Use async/await for concurrent downloads
  4. Implement retry logic for API failures
  5. Cache results if same combination requested multiple times
  6. Set timeouts on HTTP requests (30-60 seconds)
  7. Compress outputs to WebP for storage efficiency
  8. Monitor costs - Gemini charges per image token
  9. Provide clear prompts for better results
  10. Handle rate limits gracefully
  1. 发送前调整图片大小 - 降低API成本和延迟
  2. 发送前验证URL - 避免404错误
  3. 使用async/await - 实现并发下载
  4. 实现重试逻辑 - 处理API调用失败
  5. 缓存结果 - 相同合成请求可直接返回缓存
  6. 设置HTTP请求超时 - 建议30-60秒
  7. 将输出压缩为WebP格式 - 提升存储效率
  8. 监控成本 - Gemini按图片token计费
  9. 提供清晰的提示词 - 获得更符合预期的结果
  10. 优雅处理速率限制 - 避免被API限流

Cost Optimization

成本优化

Pricing (as of 2024)

2024年定价

  • Input tokens: $0.30/M
  • Output tokens: $2.50/M
  • Image tokens: $1.238/K images
  • 输入token:$0.30/百万
  • 输出token:$2.50/百万
  • 图片token:$1.238/千张图片

Optimization Strategies

优化策略

python
undefined
python
undefined

1. Resize to minimum required dimensions

1. 调整到最小所需尺寸

COST_OPTIMIZED_SIZE = (512, 512) # Lower cost BALANCED_SIZE = (1024, 1024) # Good quality/cost ratio HIGH_QUALITY_SIZE = (2048, 2048) # Maximum quality
COST_OPTIMIZED_SIZE = (512, 512) # 成本最低 BALANCED_SIZE = (1024, 1024) # 画质与成本平衡 HIGH_QUALITY_SIZE = (2048, 2048) # 最高画质

2. Use appropriate quality settings

2. 使用合适的画质设置

def optimize_for_cost(img: Image) -> bytes: img.thumbnail((1024, 1024)) buffer = BytesIO() img.save(buffer, format='JPEG', quality=75) # Lower quality = smaller size return buffer.getvalue()
def optimize_for_cost(img: Image) -> bytes: img.thumbnail((1024, 1024)) buffer = BytesIO() img.save(buffer, format='JPEG', quality=75) # 更低画质=更小体积 return buffer.getvalue()

3. Cache combinations

3. 缓存合成结果

from functools import lru_cache
@lru_cache(maxsize=100) async def cached_combine(image_urls_tuple: tuple, prompt: str): return await combine_images(list(image_urls_tuple), prompt)
undefined
from functools import lru_cache
@lru_cache(maxsize=100) async def cached_combine(image_urls_tuple: tuple, prompt: str): return await combine_images(list(image_urls_tuple), prompt)
undefined

Common Pitfalls

常见误区

Don't: Send full-resolution images (wastes tokens) ✅ Do: Resize to 1024x1024 or smaller
Don't: Use vague prompts like "combine these" ✅ Do: Provide specific instructions with desired outcome
Don't: Forget to validate image URLs ✅ Do: Check URLs are accessible before processing
Don't: Block API endpoints waiting for result ✅ Do: Return immediately, process async if needed
不要:发送全分辨率图片(浪费token) ✅ :调整为1024x1024或更小尺寸
不要:使用模糊的提示词如"合并这些图" ✅ :提供具体的指令和预期结果
不要:忘记验证图片URL ✅ :处理前检查URL是否可访问
不要:阻塞API端点等待结果 ✅ :立即返回响应,必要时异步处理

Complete Production Example

生产环境完整示例

python
from fastapi import FastAPI
from typing import List
import asyncio

app = FastAPI()
service = ImageCombinationService()

@app.post("/api/tools/combine")
async def combine_tool(
    image_urls: List[HttpUrl],
    prompt: str,
    style: str = "natural"
):
    """Production-ready combination endpoint"""

    # Validate inputs
    if len(image_urls) < 2:
        return {"error": "Need at least 2 images"}

    if len(image_urls) > 8:
        return {"error": "Maximum 8 images allowed"}

    # Create input
    input = CombineImagesInput(
        image_urls=image_urls,
        prompt=prompt,
        style=style,
        resize_inputs=True  # Cost optimization
    )

    # Execute with timeout
    try:
        result = await asyncio.wait_for(
            service.combine_images(input),
            timeout=60.0
        )

        if result.success:
            return {
                "status": "success",
                "result_url": result.result_url,
                "images_processed": result.images_processed
            }
        else:
            return {
                "status": "error",
                "error": result.error
            }

    except asyncio.TimeoutError:
        return {"status": "error", "error": "Combination timeout"}
    except Exception as e:
        return {"status": "error", "error": str(e)}
python
from fastapi import FastAPI
from typing import List
import asyncio

app = FastAPI()
service = ImageCombinationService()

@app.post("/api/tools/combine")
async def combine_tool(
    image_urls: List[HttpUrl],
    prompt: str,
    style: str = "natural"
):
    """生产环境可用的图片合成端点"""

    # 验证输入
    if len(image_urls) < 2:
        return {"error": "至少需要2张图片"}

    if len(image_urls) > 8:
        return {"error": "最多支持8张图片"}

    # 创建输入参数
    input = CombineImagesInput(
        image_urls=image_urls,
        prompt=prompt,
        style=style,
        resize_inputs=True  # 成本优化
    )

    # 带超时的执行
    try:
        result = await asyncio.wait_for(
            service.combine_images(input),
            timeout=60.0
        )

        if result.success:
            return {
                "status": "success",
                "result_url": result.result_url,
                "images_processed": result.images_processed
            }
        else:
            return {
                "status": "error",
                "error": result.error
            }

    except asyncio.TimeoutError:
        return {"status": "error", "error": "合成超时"}
    except Exception as e:
        return {"status": "error", "error": str(e)}

Resources

参考资源