deploying-airflow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Deploying Airflow

部署Airflow

This skill covers deploying Airflow DAGs and projects to production, whether using Astro (Astronomer's managed platform) or open-source Airflow on Docker Compose or Kubernetes.
Choosing a path: Astro is a good fit for managed operations and faster CI/CD. For open-source, use Docker Compose for dev and the Helm chart for production.

本技能涵盖将Airflow DAG与项目部署到生产环境的内容,包括使用Astro(Astronomer的托管平台),或是在Docker Compose或Kubernetes上部署开源Airflow。
部署路径选择:Astro适合托管式运维与快速CI/CD场景。对于开源版本,开发环境推荐使用Docker Compose,生产环境则推荐使用Helm Chart部署在Kubernetes上。

Astro (Astronomer)

Astro(Astronomer)

Astro provides CLI commands and GitHub integration for deploying Airflow projects.
Astro提供CLI命令与GitHub集成功能,用于部署Airflow项目。

Deploy Commands

部署命令

CommandWhat It Does
astro deploy
Full project deploy — builds Docker image and deploys DAGs
astro deploy --dags
DAG-only deploy — pushes only DAG files (fast, no image build)
astro deploy --image
Image-only deploy — pushes only the Docker image (for multi-repo CI/CD)
astro deploy --dbt
dbt project deploy — deploys a dbt project to run alongside Airflow
命令功能说明
astro deploy
完整项目部署——构建Docker镜像并部署DAG
astro deploy --dags
仅DAG部署——仅推送DAG文件(速度快,无需构建镜像)
astro deploy --image
仅镜像部署——仅推送Docker镜像(适用于多仓库CI/CD场景)
astro deploy --dbt
dbt项目部署——将dbt项目部署至Airflow并协同运行

Full Project Deploy

完整项目部署

Builds a Docker image from your Astro project and deploys everything (DAGs, plugins, requirements, packages):
bash
astro deploy
Use this when you've changed
requirements.txt
,
Dockerfile
,
packages.txt
, plugins, or any non-DAG file.
基于Astro项目构建Docker镜像,并部署所有内容(DAG、插件、依赖包、配置文件):
bash
astro deploy
当你修改了
requirements.txt
Dockerfile
packages.txt
、插件或任何非DAG文件时,使用该命令。

DAG-Only Deploy

仅DAG部署

Pushes only files in the
dags/
directory without rebuilding the Docker image:
bash
astro deploy --dags
This is significantly faster than a full deploy since it skips the image build. Use this when you've only changed DAG files and haven't modified dependencies or configuration.
仅推送
dags/
目录下的文件,无需重新构建Docker镜像:
bash
astro deploy --dags
由于跳过了镜像构建步骤,该命令比完整部署快得多。仅当你只修改了DAG文件,且未更改依赖或配置时使用。

Image-Only Deploy

仅镜像部署

Pushes only the Docker image without updating DAGs:
bash
astro deploy --image
This is useful in multi-repo setups where DAGs are deployed separately from the image, or in CI/CD pipelines that manage image and DAG deploys independently.
仅推送Docker镜像,不更新DAG:
bash
astro deploy --image
这在多仓库架构中非常实用,比如DAG与镜像分开部署,或者在CI/CD流水线中独立管理镜像与DAG的发布周期。

dbt Project Deploy

dbt项目部署

Deploys a dbt project to run with Cosmos on an Astro deployment:
bash
astro deploy --dbt
将dbt项目部署至Astro环境,与Cosmos协同运行:
bash
astro deploy --dbt

GitHub Integration

GitHub集成

Astro supports branch-to-deployment mapping for automated deploys:
  • Map branches to specific deployments (e.g.,
    main
    -> production,
    develop
    -> staging)
  • Pushes to mapped branches trigger automatic deploys
  • Supports DAG-only deploys on merge for faster iteration
Configure this in the Astro UI under Deployment Settings > CI/CD.
Astro支持分支与部署环境的映射,实现自动化部署:
  • 将分支映射到特定部署环境(例如
    main
    分支对应生产环境,
    develop
    分支对应预发布环境)
  • 推送代码至映射分支时自动触发部署
  • 支持合并后仅部署DAG,加速迭代效率
在Astro UI的部署设置 > CI/CD中配置该功能。

CI/CD Patterns

CI/CD模式

Common CI/CD strategies on Astro:
  1. DAG-only on feature branches: Use
    astro deploy --dags
    for fast iteration during development
  2. Full deploy on main: Use
    astro deploy
    on merge to main for production releases
  3. Separate image and DAG pipelines: Use
    --image
    and
    --dags
    in separate CI jobs for independent release cycles
Astro上常见的CI/CD策略:
  1. 功能分支仅部署DAG:开发阶段使用
    astro deploy --dags
    实现快速迭代
  2. 主分支完整部署:合并到main分支时使用
    astro deploy
    进行生产发布
  3. 镜像与DAG流水线分离:在独立的CI任务中分别使用
    --image
    --dags
    命令,实现独立发布周期

Deploy Queue

部署队列

When multiple deploys are triggered in quick succession, Astro processes them sequentially in a deploy queue. Each deploy completes before the next one starts.
当短时间内触发多个部署任务时,Astro会按照顺序依次处理,完成一个部署后再开始下一个。

Reference

参考文档

Open-Source: Docker Compose

开源版本:Docker Compose

Deploy Airflow using the official Docker Compose setup. This is recommended for learning and exploration — for production, use Kubernetes with the Helm chart (see below).
使用官方Docker Compose部署Airflow。该方式推荐用于学习与探索——生产环境请使用Kubernetes + Helm Chart(见下文)。

Prerequisites

前置条件

  • Docker and Docker Compose v2.14.0+
  • The official
    apache/airflow
    Docker image
  • Docker与Docker Compose v2.14.0+
  • 官方
    apache/airflow
    Docker镜像

Quick Start

快速开始

Download the official Airflow 3 Docker Compose file:
bash
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'
This sets up the full Airflow 3 architecture:
ServicePurpose
airflow-apiserver
REST API and UI (port 8080)
airflow-scheduler
Schedules DAG runs
airflow-dag-processor
Parses and processes DAG files
airflow-worker
Executes tasks (CeleryExecutor)
airflow-triggerer
Handles deferrable/async tasks
postgres
Metadata database
redis
Celery message broker
下载官方Airflow 3 Docker Compose文件:
bash
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'
该文件会搭建完整的Airflow 3架构:
服务用途
airflow-apiserver
REST API与UI(端口8080)
airflow-scheduler
调度DAG运行
airflow-dag-processor
解析与处理DAG文件
airflow-worker
执行任务(CeleryExecutor)
airflow-triggerer
处理可延迟/异步任务
postgres
元数据库
redis
Celery消息代理

Minimal Setup

极简配置

For a simpler setup with LocalExecutor (no Celery/Redis), create a
docker-compose.yaml
:
yaml
x-airflow-common: &airflow-common
  image: apache/airflow:3  # Use the latest Airflow 3.x release
  environment: &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: LocalExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__CORE__DAGS_FOLDER: /opt/airflow/dags
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 10s
      retries: 5
      start_period: 5s

  airflow-init:
    <<: *airflow-common
    entrypoint: /bin/bash
    command:
      - -c
      - |
        airflow db migrate
        airflow users create \
          --username admin \
          --firstname Admin \
          --lastname User \
          --role Admin \
          --email admin@example.com \
          --password admin
    depends_on:
      postgres:
        condition: service_healthy

  airflow-apiserver:
    <<: *airflow-common
    command: airflow api-server
    ports:
      - "8080:8080"
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s

  airflow-scheduler:
    <<: *airflow-common
    command: airflow scheduler

  airflow-dag-processor:
    <<: *airflow-common
    command: airflow dag-processor

  airflow-triggerer:
    <<: *airflow-common
    command: airflow triggerer

volumes:
  postgres-db-volume:
Airflow 3 architecture note: The webserver has been replaced by the API server (
airflow api-server
), and the DAG processor now runs as a standalone process separate from the scheduler.
如需使用LocalExecutor(无需Celery/Redis)的简化配置,创建
docker-compose.yaml
文件:
yaml
x-airflow-common: &airflow-common
  image: apache/airflow:3  # Use the latest Airflow 3.x release
  environment: &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: LocalExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__CORE__DAGS_FOLDER: /opt/airflow/dags
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 10s
      retries: 5
      start_period: 5s

  airflow-init:
    <<: *airflow-common
    entrypoint: /bin/bash
    command:
      - -c
      - |
        airflow db migrate
        airflow users create \
          --username admin \
          --firstname Admin \
          --lastname User \
          --role Admin \
          --email admin@example.com \
          --password admin
    depends_on:
      postgres:
        condition: service_healthy

  airflow-apiserver:
    <<: *airflow-common
    command: airflow api-server
    ports:
      - "8080:8080"
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s

  airflow-scheduler:
    <<: *airflow-common
    command: airflow scheduler

  airflow-dag-processor:
    <<: *airflow-common
    command: airflow dag-processor

  airflow-triggerer:
    <<: *airflow-common
    command: airflow triggerer

volumes:
  postgres-db-volume:
Airflow 3架构说明:Webserver已被API服务器
airflow api-server
)取代,DAG处理器现在作为独立进程运行,与调度器分离。

Common Operations

常见操作

bash
undefined
bash
undefined

Start all services

启动所有服务

docker compose up -d
docker compose up -d

Stop all services

停止所有服务

docker compose down
docker compose down

View logs

查看调度器日志

docker compose logs -f airflow-scheduler
docker compose logs -f airflow-scheduler

Restart after requirements change

修改依赖后重启服务

docker compose down && docker compose up -d --build
docker compose down && docker compose up -d --build

Run a one-off Airflow CLI command

执行单次Airflow CLI命令

docker compose exec airflow-apiserver airflow dags list
undefined
docker compose exec airflow-apiserver airflow dags list
undefined

Installing Python Packages

安装Python包

Add packages to
requirements.txt
and rebuild:
bash
undefined
将包添加到
requirements.txt
后重新构建:
bash
undefined

Add to requirements.txt, then:

编辑requirements.txt后执行:

docker compose down docker compose up -d --build

Or use a custom Dockerfile:

```dockerfile
FROM apache/airflow:3  # Pin to a specific version (e.g., 3.1.7) for reproducibility
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
Update
docker-compose.yaml
to build from the Dockerfile:
yaml
x-airflow-common: &airflow-common
  build:
    context: .
    dockerfile: Dockerfile
  # ... rest of config
docker compose down docker compose up -d --build

或使用自定义Dockerfile:

```dockerfile
FROM apache/airflow:3  # Pin to a specific version (e.g., 3.1.7) for reproducibility
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
更新
docker-compose.yaml
以使用该Dockerfile构建镜像:
yaml
x-airflow-common: &airflow-common
  build:
    context: .
    dockerfile: Dockerfile
  # ... rest of config

Environment Variables

环境变量

Configure Airflow settings via environment variables in
docker-compose.yaml
:
yaml
environment:
  # Core settings
  AIRFLOW__CORE__EXECUTOR: LocalExecutor
  AIRFLOW__CORE__PARALLELISM: 32
  AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG: 16

  # Email
  AIRFLOW__EMAIL__EMAIL_BACKEND: airflow.utils.email.send_email_smtp
  AIRFLOW__SMTP__SMTP_HOST: smtp.example.com

  # Connections (as URI)
  AIRFLOW_CONN_MY_DB: postgresql://user:pass@host:5432/db

docker-compose.yaml
中通过环境变量配置Airflow:
yaml
environment:
  # 核心设置
  AIRFLOW__CORE__EXECUTOR: LocalExecutor
  AIRFLOW__CORE__PARALLELISM: 32
  AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG: 16

  # 邮件设置
  AIRFLOW__EMAIL__EMAIL_BACKEND: airflow.utils.email.send_email_smtp
  AIRFLOW__SMTP__SMTP_HOST: smtp.example.com

  # 连接配置(URI格式)
  AIRFLOW_CONN_MY_DB: postgresql://user:pass@host:5432/db

Open-Source: Kubernetes (Helm Chart)

开源版本:Kubernetes(Helm Chart)

Deploy Airflow on Kubernetes using the official Apache Airflow Helm chart.
使用官方Apache Airflow Helm Chart在Kubernetes上部署Airflow。

Prerequisites

前置条件

  • A Kubernetes cluster
  • kubectl
    configured
  • helm
    installed
  • Kubernetes集群
  • 已配置
    kubectl
  • 已安装
    helm

Installation

安装步骤

bash
undefined
bash
undefined

Add the Airflow Helm repo

添加Airflow Helm仓库

helm repo add apache-airflow https://airflow.apache.org helm repo update
helm repo add apache-airflow https://airflow.apache.org helm repo update

Install with default values

使用默认值安装

helm install airflow apache-airflow/airflow
--namespace airflow
--create-namespace
helm install airflow apache-airflow/airflow
--namespace airflow
--create-namespace

Install with custom values

使用自定义配置安装

helm install airflow apache-airflow/airflow
--namespace airflow
--create-namespace
-f values.yaml
undefined
helm install airflow apache-airflow/airflow
--namespace airflow
--create-namespace
-f values.yaml
undefined

Key values.yaml Configuration

关键values.yaml配置

yaml
undefined
yaml
undefined

Executor type

执行器类型

executor: KubernetesExecutor # or CeleryExecutor, LocalExecutor
executor: KubernetesExecutor # 或CeleryExecutor、LocalExecutor

Airflow image (pin to your desired version)

Airflow镜像(固定到指定版本以保证可复现性)

defaultAirflowRepository: apache/airflow defaultAirflowTag: "3" # Or pin: "3.1.7"
defaultAirflowRepository: apache/airflow defaultAirflowTag: "3" # 或固定版本:"3.1.7"

Git-sync for DAGs (recommended for production)

Git-sync同步DAG(生产环境推荐)

dags: gitSync: enabled: true repo: https://github.com/your-org/your-dags.git branch: main subPath: dags wait: 60 # seconds between syncs
dags: gitSync: enabled: true repo: https://github.com/your-org/your-dags.git branch: main subPath: dags wait: 60 # 同步间隔(秒)

API server (replaces webserver in Airflow 3)

API服务器(Airflow 3中替代webserver)

apiServer: resources: requests: cpu: "250m" memory: "512Mi" limits: cpu: "500m" memory: "1Gi" replicas: 1
apiServer: resources: requests: cpu: "250m" memory: "512Mi" limits: cpu: "500m" memory: "1Gi" replicas: 1

Scheduler

调度器

scheduler: resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "1000m" memory: "2Gi"
scheduler: resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "1000m" memory: "2Gi"

Standalone DAG processor

独立DAG处理器

dagProcessor: enabled: true resources: requests: cpu: "250m" memory: "512Mi" limits: cpu: "500m" memory: "1Gi"
dagProcessor: enabled: true resources: requests: cpu: "250m" memory: "512Mi" limits: cpu: "500m" memory: "1Gi"

Triggerer (for deferrable tasks)

Triggerer(处理可延迟任务)

triggerer: resources: requests: cpu: "250m" memory: "512Mi" limits: cpu: "500m" memory: "1Gi"
triggerer: resources: requests: cpu: "250m" memory: "512Mi" limits: cpu: "500m" memory: "1Gi"

Worker resources (CeleryExecutor only)

Worker资源配置(仅CeleryExecutor)

workers: resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2000m" memory: "4Gi" replicas: 2
workers: resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "2000m" memory: "4Gi" replicas: 2

Log persistence

日志持久化

logs: persistence: enabled: true size: 10Gi
logs: persistence: enabled: true size: 10Gi

PostgreSQL (built-in)

内置PostgreSQL

postgresql: enabled: true
postgresql: enabled: true

Or use an external database

或使用外部数据库

postgresql:

postgresql:

enabled: false

enabled: false

data:

data:

metadataConnection:

metadataConnection:

user: airflow

user: airflow

pass: airflow

pass: airflow

host: your-rds-host.amazonaws.com

host: your-rds-host.amazonaws.com

port: 5432

port: 5432

db: airflow

db: airflow

undefined
undefined

Upgrading

升级操作

bash
undefined
bash
undefined

Upgrade with new values

使用新配置升级

helm upgrade airflow apache-airflow/airflow
--namespace airflow
-f values.yaml
helm upgrade airflow apache-airflow/airflow
--namespace airflow
-f values.yaml

Upgrade to a new Airflow version

升级到新的Airflow版本

helm upgrade airflow apache-airflow/airflow
--namespace airflow
--set defaultAirflowTag="<version>"
undefined
helm upgrade airflow apache-airflow/airflow
--namespace airflow
--set defaultAirflowTag="<version>"
undefined

DAG Deployment Strategies on Kubernetes

Kubernetes上的DAG部署策略

  1. Git-sync (recommended): DAGs are synced from a Git repository automatically
  2. Persistent Volume: Mount a shared PV containing DAGs
  3. Baked into image: Include DAGs in a custom Docker image
  1. **Git-sync(推荐)**DAG自动从Git仓库同步
  2. 持久化卷:挂载包含DAG的共享PV
  3. 内置到镜像:将DAG包含在自定义Docker镜像中

Useful Commands

常用命令

bash
undefined
bash
undefined

Check pod status

查看Pod状态

kubectl get pods -n airflow
kubectl get pods -n airflow

View scheduler logs

查看调度器日志

kubectl logs -f deployment/airflow-scheduler -n airflow
kubectl logs -f deployment/airflow-scheduler -n airflow

Port-forward the API server

端口转发API服务器

kubectl port-forward svc/airflow-apiserver 8080:8080 -n airflow
kubectl port-forward svc/airflow-apiserver 8080:8080 -n airflow

Run a one-off CLI command

执行单次CLI命令

kubectl exec -it deployment/airflow-scheduler -n airflow -- airflow dags list

---
kubectl exec -it deployment/airflow-scheduler -n airflow -- airflow dags list

---

Related Skills

相关技能

  • setting-up-astro-project: For initializing a new Astro project
  • managing-astro-local-env: For local development with
    astro dev
  • authoring-dags: For writing DAGs before deployment
  • testing-dags: For testing DAGs before deployment
  • setting-up-astro-project:初始化新的Astro项目
  • managing-astro-local-env:使用
    astro dev
    进行本地开发
  • authoring-dags:部署前编写DAG
  • testing-dags:部署前测试DAG