RunPod Cloud GPU
Run open-source AI models on cloud GPUs via RunPod serverless. Pay-per-second, no minimums.
Setup
bash
# 1. Create account at https://runpod.io
# 2. Add API key to .env
echo "RUNPOD_API_KEY=your_key_here" >> .env
# 3. Deploy any tool with --setup
python tools/image_edit.py --setup
python tools/upscale.py --setup
python tools/dewatermark.py --setup
python tools/sadtalker.py --setup
python tools/qwen3_tts.py --setup
- Creates a RunPod template from the Docker image
- Creates a serverless endpoint with appropriate GPU
- Saves the endpoint ID to (e.g.
RUNPOD_QWEN_EDIT_ENDPOINT_ID
)
Available Images
All images are public on GHCR — no authentication needed.
| Tool | Docker Image | GPU | VRAM | Typical Cost |
|---|
| image_edit | ghcr.io/conalmullan/video-toolkit-qwen-edit:latest
| A6000/L40S | 48GB+ | ~$0.05-0.15/job |
| upscale | ghcr.io/conalmullan/video-toolkit-realesrgan:latest
| RTX 3090/4090 | 24GB | ~$0.01-0.05/job |
| dewatermark | ghcr.io/conalmullan/video-toolkit-propainter:latest
| RTX 3090/4090 | 24GB | ~$0.05-0.30/job |
| sadtalker | ghcr.io/conalmullan/video-toolkit-sadtalker:latest
| RTX 4090 | 24GB | ~$0.05-0.15/job |
| qwen3_tts | ghcr.io/conalmullan/video-toolkit-qwen3-tts:latest
| ADA 24GB | 24GB | ~$0.01-0.05/job |
Total monthly cost: Rarely exceeds $10 even with heavy use.
How It Works
All tools follow the same pattern:
Local CLI → Upload input to cloud storage → RunPod API → Poll for result → Download output
- File transfer: Tools use Cloudflare R2 when configured (, , , ), falling back to free upload services
- RunPod API: Tools call the endpoint, then poll until complete
- Cold vs warm start: First request after idle spins up a worker (~30-90s). Subsequent requests are fast (~5-15s)
Endpoint Management
Workers
workersMin: 0 — Scale to zero when idle (no cost)
workersMax: 1 — Max concurrent jobs (increase for throughput)
idleTimeout: 5 — Seconds before worker scales down
Across all endpoints, you share a total worker pool based on your RunPod plan. If you hit limits, reduce
on endpoints you're not actively using.
Checking Endpoint Status
Each tool stores its endpoint ID in
:
| Tool | Env Var |
|---|
| image_edit | RUNPOD_QWEN_EDIT_ENDPOINT_ID
|
| upscale | RUNPOD_UPSCALE_ENDPOINT_ID
|
| dewatermark | RUNPOD_DEWATERMARK_ENDPOINT_ID
|
| sadtalker | RUNPOD_SADTALKER_ENDPOINT_ID
|
| qwen3_tts | RUNPOD_QWEN3_TTS_ENDPOINT_ID
|
Disabling an Endpoint
To free worker slots without deleting the endpoint, set
via the RunPod dashboard or GraphQL API.
Troubleshooting
Force Image Pull
When you push a new Docker image version, RunPod may still use the cached old one. To force a pull:
- Update the template's to use notation
- Wait for the worker to restart
- Revert to tag after confirming
Cold Start Too Slow
- qwen3-tts: ~70s cold start, ~7s warm
- sadtalker: ~60s cold start, ~10s warm
- image_edit: ~90s cold start, ~15s warm
If cold starts are a problem, set
(costs money when idle).
Job Fails with OOM
The model needs more VRAM than the GPU provides. Options:
- Use a larger GPU tier
- For dewatermark: reduce (default 0.5 for safety)
- For image_edit: reduce
"No workers available"
You've hit your plan's concurrent worker limit. Either:
- Wait for a running job to finish
- Set on endpoints you're not using
- Upgrade your RunPod plan
Docker Images
All Dockerfiles live in
. Images use
as the base to share layers across tools.
Building for RunPod (from Apple Silicon Mac):
bash
docker buildx build --platform linux/amd64 -t ghcr.io/conalmullan/video-toolkit-<name>:latest docker/runpod-<name>/
docker push ghcr.io/conalmullan/video-toolkit-<name>:latest
GHCR packages default to private — you must manually make them public for RunPod to pull them. Go to GitHub > Packages > Package Settings > Change Visibility.
Cost Optimization
- Keep on all endpoints (scale to zero)
- Only deploy endpoints you actively need
- Use to disable idle endpoints without deleting them
- Qwen3-TTS is significantly cheaper than ElevenLabs for voiceovers
- Check the RunPod dashboard for usage and billing