modal

Original：🇺🇸 English

Translated

Use when "Modal", "serverless GPU", "cloud GPU", "deploy ML model", or asking about "serverless containers", "GPU compute", "batch processing", "scheduled jobs", "autoscaling ML"

9installs

Sourceeyadsibai/ltk

Added on2026-02-15

NPX Install

npx skill4agent add eyadsibai/ltk modal

SKILL.md Content

View Translation Comparison →

Modal Serverless Cloud Platform

Serverless Python execution with GPUs, autoscaling, and pay-per-use compute.

When to Use

Deploy and serve ML models (LLMs, image generation)
Run GPU-accelerated computation
Batch process large datasets in parallel
Schedule compute-intensive jobs
Build serverless APIs with autoscaling

Quick Start

bash

# Install
pip install modal

# Authenticate
modal token new

python

import modal

app = modal.App("my-app")

@app.function()
def hello():
    return "Hello from Modal!"

# Run with: modal run script.py

Container Images

python

# Build image with dependencies
image = (
    modal.Image.debian_slim(python_version="3.12")
    .pip_install("torch", "transformers", "numpy")
)

app = modal.App("ml-app", image=image)

GPU Functions

python

@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # GPU code here

# Available GPUs: T4, L4, A10, A100, L40S, H100, H200, B200
# Multi-GPU: gpu="H100:8"

Web Endpoints

python

@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    result = model.predict(data["input"])
    return {"prediction": result}

# Deploy: modal deploy script.py

Scheduled Jobs

python

@app.function(schedule=modal.Cron("0 2 * * *"))  # Daily at 2 AM
def daily_backup():
    pass

@app.function(schedule=modal.Period(hours=4))  # Every 4 hours
def refresh_cache():
    pass

Autoscaling

python

@app.function()
def process_item(item_id: int):
    return analyze(item_id)

@app.local_entrypoint()
def main():
    items = range(1000)
    # Automatically parallelized across containers
    results = list(process_item.map(items))

Persistent Storage

python

volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # Persist changes

Secrets Management

python

@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]

ML Model Serving

python

@app.cls(gpu="L40S")
class Model:
    @modal.enter()
    def load_model(self):
        from transformers import pipeline
        self.pipe = pipeline("text-classification", device="cuda")

    @modal.method()
    def predict(self, text: str):
        return self.pipe(text)

@app.local_entrypoint()
def main():
    model = Model()
    result = model.predict.remote("Modal is great!")

Resource Configuration

python

@app.function(
    cpu=8.0,              # 8 CPU cores
    memory=32768,         # 32 GiB RAM
    ephemeral_disk=10240, # 10 GiB disk
    timeout=3600          # 1 hour timeout
)
def memory_intensive_task():
    pass

Best Practices

Pin dependencies for reproducible builds
Use appropriate GPU types - L40S for inference, H100 for training
Leverage caching via Volumes for model weights
Use
.map()
for parallel processing
Import packages inside functions if not available locally
Store secrets securely - never hardcode API keys

vs Alternatives

Platform	Best For
Modal	Serverless GPUs, autoscaling, Python-native
RunPod	GPU rental, long-running jobs
AWS Lambda	CPU workloads, AWS ecosystem
Replicate	Model hosting, simple deployments

Resources

Docs: https://modal.com/docs
Examples: https://github.com/modal-labs/modal-examples

modal

NPX Install

Tags

SKILL.md Content

Modal Serverless Cloud Platform

When to Use

Quick Start

Container Images

GPU Functions

Web Endpoints

Scheduled Jobs

Autoscaling

Persistent Storage

Secrets Management

ML Model Serving

Resource Configuration

Best Practices

vs Alternatives

Resources