gcp-pipeline-orchestration
Original:🇺🇸 English
Translated
1 scriptsChecked / no sensitive code detected
This skill helps the agent generate or update orchestration pipeline definitions for Google Cloud Composer to initialize orchestration pipeline or update the orchestration definition for orchestration of various data pipelines, like dbt pipelines, notebooks, Spark jobs, Dataform, Python scripts or inline BigQuery SQL queries. This skill also helps deploy and trigger orchestration pipelines.
7installs
Added on
NPX Install
npx skill4agent add gemini-cli-extensions/data-agent-kit-starter-pack gcp-pipeline-orchestrationTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Mandatory Reference Routing
If relevant, call the associated reference file(s) before you take actions.
Refer to the table below to determine which reference file to retrieve in
different scenarios involving specific functions. [!IMPORTANT]: DO NOT GUESS
filenames. You MUST only use the exact paths provided below.
| Function/Use Case | Required Reference File | Capabilities & Intent Keywords |
|---|---|---|
| orchestration-pipelines schema | | orchestrate, generate, create, update |
How to use this skill
Orchestration pipelines require creating two files to ensure a complete and
deployable pipeline:
1. `Orchestration File` (e.g., `orchestration-pipeline.yaml`,
`test-pipeline.yaml`): Defines the pipeline's logic, tasks, and
schedule. **IMPORTANT:** Check if a `deployment.yaml` file exists and
references an existing orchestration file. If it does, you **must update
the existing orchestration file** (e.g.,`test_pipeline.yaml`) instead of
creating a new one. The filename can be customized but must be
referenced in the `deployment.yaml` file.
2. `deployment.yaml`: Defines the environment-specific
configurations.(e.g., `dev`, `prod`). `deployment.yaml`should only
exists in the repository root and must be named `deployment.yaml`-
All files should always be maintained together. And all files should be placed on the root of the workspace folder.
-
This skill is helpful to create or update configuration files to orchestrate data pipelines.
How to use this skill
Step 1: Assess Orchestration Pipeline Status and Initialize if Necessary
Examine the repository's root directory for a file.
deployment.yaml-
Check for existing setup: The absence ofindicates that orchestration has not been set up.
deployment.yaml -
Determine if initialization is required: Initialization is required ifis missing. you MUST run the
deployment.yamlcommand in Step 3 to scaffold the project ifinitis missing. Do NOT create the files manually.deployment.yaml -
Pipeline Name: If initialization is needed, ask the user for the pipeline name. If user hasn't provided the orchestration pipeline name, name should be "orchestration_pipeline"
-
Environment Name: If initialization is needed, you MUST ask the user for the environment name. If the user does not provide it, use dev as the default.
-
Execute Initialization: Once you have the pipeline name, run the following command:
# Replace <ORCHESTRATION_PIPELINE_NAME> with the actual name
# Replace <ENV_NAME> with the actual environment name
gcloud beta orchestration-pipelines init <ORCHESTRATION_PIPELINE_NAME> --environment=<ENV_NAME>Step 2: Review the orchestration pipeline code structure and syntax instruction
*** Pipeline Models (mapping to YAML)
[!IMPORTANT] While the internal pipeline models are defined using protobuf (which typically uses), the YAML configuration expectssnake_casefor almost all field names.camelCaseMapping Rule: Always convertproto fields (e.g.,snake_case) topipeline_idin YAML (e.g.,camelCase).pipelineId
Orchestration-Pipelines yaml structure and syntax instruction
Reference to file .
references/orchestration-pipelines-schema.mdRequired Tags (Top-Level): You MUST add a field to the top-level
of the orchestration pipeline YAML definition. The value of this field depends
on the IDE environment:
tags- For Antigravity, use .
["job:datacloud:antigravity"] - For VS Code, use .
["job:datacloud:vscode"] - For any other environment, use .
["job:datacloud:other"]
Deployment yaml structure and syntax instruction.
Top-Level Structure: The root of the YAML should be an object with the
following fields:
- (dictionary): A map where keys are environment names (e.g., 'dev', 'prod', etc) and values are Environment objects.
environments
Environment: Each environment object contains the following fields:
- (string): The Google Cloud Project ID.
project - (string): The Google Cloud region (e.g., 'us-central1').
region - (string): The Cloud Composer environment name.
composer_environment artifact_storage- (string): GCS bucket
bucket - (string): prefix of path that we want to put in bucket
path_prefix
pipelines- (string): orchestration pipeline yaml file names. It can be multiple
- source
- (dictionary, optional): Key-value pairs representing environment variables. Values can be strings, numbers, or booleans.
variables
[!TIP] If the user doesn't provide specific paths for scripts, dbt projects, or GCP details (Project ID, Region), use tools liketo search the repository andfind_by_namecommands (e.g.,gcloud) to retrieve the necessary information.gcloud config get-value project
Step 3: Generate the pipeline files
-
Before generating, check if an orchestration pipeline definition file andalready exist in the current directory. If they do, inform the user and ask if they want to update the existing files or create new ones with different names. Do not overwrite without confirmation.
deployment.yaml -
First, before creating the orchestration pipeline definition file, you must first run the following command to get the list of available dataproc environments for the user's project. This avoids using placeholder values to run the jobs.
# Replace <PROJECT_ID> with the actual project_id # Replace <REGION> with the actual region gcloud dataproc clusters list \ --project <PROJECT_ID> \ --region <REGION> \[!TIP] Running the command withoutprovides a clear, tabular output that is easier to read.--format=yaml -
Then use the returned dataproc list with details to create the orchestration pipeline definition file based on the user's requirements for the pipeline's logic and schedule. IMPORTANT: Every schedule must include an. Every schedule must use the current date as
endTimeif the user hasn't specified.startTime[!IMPORTANT] A Composer environment is not a Dataproc cluster. If no Dataproc clusters are available, do not use a Composer environment for the. It is better to omit this configuration if a dedicated Spark History Server is not available.sparkHistoryServerConfig -
If you want to schedule the python job, check the content of Python content to determine if it's a spark job. If it is, useas type instead of script as type.
pyspark -
Before creating or updating thefile, you must first run the following command to get the list of available Composer environments for the user's project.
deployment.yaml# Replace <PROJECT_ID> with the actual project_id # Replace <REGION> with the actual region gcloud composer environments list \ --project <PROJECT_ID> \ --locations <REGION> \After listing available Composer environments, you must check each environment to ensure the composer is using the right image version or has installed right PyPI packages. Run the following command for each environment:# Replace <ENVIRONMENT_NAME> with the Composer environment name # Replace <REGION> with the region gcloud composer environments describe <ENVIRONMENT_NAME> \ --location <REGION> \ --format="json(config.softwareConfig.imageVersion, config.softwareConfig.pypiPackages)"From the output, select an environment where the imageVersion value is one of is "composer-3-airflow-3.1.7-build.x, composer-3-airflow-2.11.1-build.x, composer-3-airflow-2.10.5-build.x, composer-3-airflow-2.9.3-build.x, composer-2.16.11-airflow-2.11.1, composer-2.16.11-airflow-2.10.5, composer-2.16.11-airflow-2.9.3" or select an environment wherefield is presented listed in the PyPI packages. This ensures the selected environment is compatible with orchestration pipelines.orchestration-pipelines -
Third, before generating thefile, you must ask the user to provide the
deployment.yamlbucket name. Note that theartifact_storagebucket is typically initialized as a placeholder (e.g.,artifact_storage) by theYOUR_BUCKETcommand in Step 1. You must identify any such placeholders, ask the user for the actual bucket name, and then update theinitfile with the provided value.deployment.yamlUse the returned composer list with details, along with the project ID, region, and the bucket name provided by the user, to generate or update thefile. When generating or updating thedeployment.yamlfile, you must replace placeholders (e.g., "<YOUR_PROJECT_ID>", "<YOUR_REGION>", "<YOUR_COMPOSER>", "<YOUR_BUCKET>") with the actual retrieved and provided values. Additionally, you must remove any associateddeployment.yamlcomments once the placeholders are replaced.# TODO: -
Ensure both files adhere to the code structures and syntax specified in this document.
-
Renaming Pipelines: If requested to change the orchestration pipeline name, you must rename the orchestration YAML file accordingly (e.g., fromto
dbt_clean_pipeline.yaml) and update thenew_name.yamlfield within thesourcelist inpipelinesto match the new filename.deployment.yaml
[!IMPORTANT]Time Format: Do NOT include thesuffix inZandstartTime. Use the formatendTime(e.g.,"YYYY-MM-DDTHH:MM:SS")."2025-10-01T00:00:00"
Step 4: Validate the content (REQUIRED)
After creating or editing pipeline files, you MUST validate them using the
command. you must: a. Read the
file to identify all defined environments. b. Run the
command below for each environment found in .
gcloud beta orchestration-pipelines validatedeployment.yamlvalidatedeployment.yaml# Replace <ENV_NAME> with the identified environment name
gcloud beta orchestration-pipelines validate --environment=<ENV_NAME>Step 5: Handle Validation Errors
-
Check the output of the validation command.
-
If the command returns an error or failure message:
- Read the error message carefully.
- Edit the orchestration and deployment files to fix the specific issue mentioned.
-
Re-run the validation command to confirm the fix. Do not mark the task as complete until the validation passes (exit code 0), and do not fall back to create airflow dag in python if validation fails.
Declarative Pipeline Templates
When asked to generate or verify declarative pipeline files, ensure they follow
these compliant structures. Do not use the exact values below; adapt them to
the user's specific project, region, and environment details.
deployment.yaml
Template - IMPORTANT FORMAT MUST MATCH-
deployment.yamlyaml
environments:
<environment_name>: # e.g., dev, prod
project: <PROJECT_ID>
region: <REGION>
composer_environment: <COMPOSER_ENVIRONMENT_NAME>
gcs_bucket: "" # Optional
artifact_storage:
bucket: <ARTIFACT_BUCKET_NAME>
path_prefix: "<prefix>-" # e.g., namespace or username prefix
pipelines:
- source: '<orchestration-pipeline.yaml>' # e.g., list of pipeline yaml namesStep 6: Deploy the Orchestration Pipeline (Optional)
If requested to deploy the orchestration pipeline:
-
You MUST ask the user which environment to deploy to. If no environment name is provided, list the available environments fromand ask the user to choose one, defaulting to
deployment.yamlif it exists.dev -
Read the orchestration YAML to extract the.
pipelineId -
Deploy with. This uploads the DAG without running it:
--local# Replace <ENV_NAME> with the target environment # Replace <PIPELINE_SOURCE> with the orchestration YAML filename gcloud beta orchestration-pipelines deploy \ --environment=<ENV_NAME> --local -
Parse the deploy output to extract the bundle ID (version). The output includes a line like:The version string (e.g.,
Pipeline deployment successful for version local-b32d15e307b5) is the bundle ID.local-b32d15e307b5
[!IMPORTANT]deployments now default to--local. The deployed DAG will be visible in Airflow as a paused DAG without a schedule. It will not auto-run. Use Step 7 to trigger it.--paused=true
Step 7: Trigger the Orchestration Pipeline Run (Optional)
If requested to trigger/run the orchestration pipeline, you MUST follow the
Deploy → Poll → Trigger flow.
-
Ask for environment: You MUST ask the user which environment to use. Default toif it exists in
dev.deployment.yaml -
Deploy first (Step 6): Always deploy before triggering to ensure the run uses the latest code. Extract thefrom deploy output and the
bundle IDfrom the orchestration YAML.pipelineId -
Poll for DAG readiness: Wait for the DAG to be registered in Composer.
# Initial delay: wait 30 seconds after deploy sleep 30 # Poll every 15 seconds, up to 2 minutes total # Replace <ENV_NAME>, <BUNDLE_ID> with actual values gcloud beta orchestration-pipelines list \ --environment=<ENV_NAME> \ --bundle=<BUNDLE_ID>The pipeline is ready when it appears in the list output. If it does not appear after 2 minutes, report failure and advise the user to check YAML validity. -
Trigger the pipeline:
# Replace <ENV_NAME>, <BUNDLE_ID>, <PIPELINE_ID> with actual values
gcloud beta orchestration-pipelines trigger \
--environment=<ENV_NAME> \
--bundle=<BUNDLE_ID> \
--pipeline=<PIPELINE_ID> - Verify the run started:
gcloud beta orchestration-pipelines runs
list \ --environment=<ENV_NAME> \ --bundle=<BUNDLE_ID> \
--pipeline=<PIPELINE_ID>`[!TIP] Trigger-only (no deploy): If the user wants to trigger an already-deployed pipeline, skip Step 6. Useto find the bundle ID, then trigger directly with Step 7.4.gcloud beta orchestration-pipelines list --environment=<ENV_NAME>
[!IMPORTANT] Fallback: Iffails, use the bundled script: Run script with -- help to discover and learn the interface.gcloud trigger
python scripts/trigger/airflow_trigger.py \ --project <PROJECT_ID>
--location <REGION> \ --environment <COMPOSER_ENV> --dag_id <PIPELINE_ID>Get,project, andregionfromcomposer_environment.deployment.yaml
Definition of done
- file is created successfully.
deployment.yaml - The orchestration pipeline file (e.g., ) is created successfully, includes a mandatory
orchestration_pipeline.yamlfor every schedule, and passes the validation command:endTimegcloud beta orchestration-pipelines validate --environment=<ENV_NAME> - If user requested to deploy the orchestration pipeline, the command should return a success message with a version/bundle ID.
gcloud beta orchestration-pipelines deploy --environment=<ENV_NAME> --local - If user requested to trigger/run the orchestration pipeline:
- Deploy succeeded (bundle ID extracted from output)
- DAG appeared in within 2 min
gcloud beta orchestration-pipelines list - returned success
gcloud beta orchestration-pipelines trigger - Run is visible in
gcloud beta orchestration-pipelines runs list
Other actions
If requested to pause/stop the orchestration pipeline, use
bash
# Replace <ENV_NAME>, <BUNDLE_ID>, <PIPELINE_ID> with actual values
gcloud beta orchestration-pipelines pause \
--environment=<ENV_NAME> \
--bundle=<BUNDLE_ID> \
--pipeline=<PIPELINE_ID> If requested to unpause/resume the orchestration pipeline, use
bash
# Replace <ENV_NAME>, <BUNDLE_ID>, <PIPELINE_ID> with actual values
gcloud beta orchestration-pipelines unpause \
--environment=<ENV_NAME> \
--bundle=<BUNDLE_ID> \
--pipeline=<PIPELINE_ID>