#
Airflow-DAGs Agent
This document is the human-readable companion to the agent that owns
Airflow DAG authoring, the custom Airflow image, dbt-Cosmos integration,
and DAG delivery. It works in the ume-data-dags
repo, not in ume-data-infra. The only touchpoint back here is the
bot-PR that bumps environments/dev-03-runtime/terraform.tfvars.
#
Role
The airflow-dags agent handles everything above the Terraform layer:
Docker image, DAGs, dbt project, and the CI glue that builds + ships
them. Terraform / infrastructure changes belong to the infra-terraform
agent in ume-data-infra.
#
Scope
#
Can edit (in ume-data-dags)
dags/— Airflow DAG filesdbt/— dbt project (models, tests, macros, profiles, packages.yml)docker/— Dockerfile, depsscripts/— build-image.sh, utility scripts.github/workflows/inume-data-dags— the four workflows (image, dag-sync, pr-ci, bot-pr)
#
Can edit in ume-data-infra (bot-PR only)
environments/dev-03-runtime/terraform.tfvars— only theairflow_image_tagline, and only via thebot-pr.ymlworkflow. Human-authored edits to that line are OK but unusual.
#
Must not edit
- Terraform modules, layers, or environment stacks beyond the tfvars bump
- Kubernetes manifests or Helm charts directly
- Secret values
#
Required Reading
Before proposing changes, the agent must read:
- Airflow on GKE — Helm chart design, Cosmos, GCS FUSE, logging, IAP auth
- CI/CD — image build + DAG sync pipeline contracts
- DataHub — for DataHub ingestion recipe context (recipes are DAGs)
#
Invariants
dbt project path on workers:
/opt/airflow/dags/dbt. All Cosmos DAGs must reference this path. GCS FUSE mounts the bucket root at/opt/airflow/dags/; the bucket layout isdags/+dbt/.Custom image extends the official Apache Airflow base — never use a Composer base image.
Image tags are immutable (AR enforces via
docker_config.immutable_tags). Format:<airflow-version>-<commit-sha>.Cosmos is the only dbt runner — no
BashOperatororPythonOperatorto invoke dbt.DAG delivery via GCS FUSE CSI —
ume-data-dags's CI doesgcloud storage rsyncon merge to main. No git-sync, no baking DAGs into images, no tokens or SSH keys. Workload Identity handles auth.No secrets in code — use Airflow connections or Secret Manager backend for credentials.
dbt connects to BigQuery via OAuth (Airflow SA workload identity) — no service-account keys.
Cosmos local execution mode — dbt runs as subprocesses on Celery workers; Cosmos copies the project to a per-task tmp dir so the read-only FUSE mount is fine. Use KPO for heavy/isolated jobs.
dbt_executable_path = /home/airflow/dbt-venv/bin/dbt— dbt lives in an isolated venv (Airflow 3.2's constraints file clashes with dbt-core onpathspec/protobuf). Cosmos LOCAL invokes dbt as subprocess, so Python-level isolation is fine. Dockerfile fails the build if/home/airflow/dbt-venv/bin/dbt --versiondoesn't work.
#
Cross-repo contract
AR repo
ume-composer-images— owned by ume-data-infra bootstrap. Content-push SA (ume-datainfra-content-push@poc-ume-data.iam.gserviceaccount.com) hasroles/artifactregistry.writerscoped to this repo only.DAGs bucket
ume-airflow-dags-poc-ume-data— same SA has bucket-levelroles/storage.objectAdmin.WIF provider federates both repos via a combined
attribute_condition; per-SA bindings gate what each repo can impersonate.
#
Verification
# Image builds (local)
scripts/build-image.sh --no-push
# DAG syntax (no metadata DB needed)
python -m py_compile dags/*.py
# dbt project parses (no BQ auth needed)
cd dbt/
DBT_TARGET=dev GCP_PROJECT=dummy DBT_DATASET=dummy \
dbt parse --profiles-dir . --project-dir .
Post-merge verification uses gcloud artifacts docker images list,
gsutil ls, bq show, and kubectl logs (read-only). Never
kubectl exec — that's blocked by the restricted execution profile.