#
Airflow on GKE
Airflow runs on the shared GKE Standard cluster via the official Apache Airflow Helm chart with CeleryExecutor. It handles orchestration for ETL pipelines, dbt runs (via Cosmos), and future DataHub ingestion recipes.
Key decisions:
- GKE Standard (not Cloud Composer) for cost control and cluster reuse
- CeleryExecutor with Redis — scheduler stays lean, workers handle task execution
- Cosmos local execution mode on workers for dbt; KPO for heavy/isolated jobs
- GCS FUSE CSI for DAG delivery (no git-sync, no tokens)
- Custom image with Cosmos + dbt toolchain (stock image for initial deployment)
- Hybrid logging: Cloud Logging (container logs, automatic) + GCS (task logs)
- Google OIDC for API server authentication (Story 4c)
#
Why GKE Standard over Cloud Composer
The 4-5x cost difference is the primary driver. The operational burden is acceptable because the GKE cluster is already planned for DataHub.
#
Executor: CeleryExecutor
CeleryExecutor uses Redis as a task broker. The scheduler enqueues tasks into Redis; dedicated worker pods pick them up and execute them. This keeps the scheduler lean and decouples task execution from scheduling.
Scheduler → Redis queue → Celery Worker(s) → execute task
#
Why CeleryExecutor over LocalExecutor
- Scheduler isolation: with LocalExecutor, dbt subprocesses compete for CPU/memory on the scheduler pod. With CeleryExecutor, workers handle execution independently.
- Scalable workers: workers can scale from 1 to N. Start with 1 worker; add more if task queuing grows.
- No KEDA needed: min=1 worker is always on. No cold-start delay. Workers on the default-pool share the node with the scheduler at no additional VM cost.
#
Why not KubernetesExecutor
KubernetesExecutor creates a pod per task — true scale-to-zero, no Redis needed. But every task incurs 10-30s pod startup overhead. At 3-10 DAGs with 10-50 models each, that's significant latency. CeleryExecutor with pre-started workers is faster for steady workloads.
Escape hatch: if you outgrow CeleryExecutor (need per-task isolation, hundreds of concurrent tasks), KubernetesExecutor is the next step. The Helm chart supports switching executors with a single value change.
#
Cosmos dbt Execution
#
Execution mode: Local (on Celery workers)
Cosmos local execution mode runs dbt as subprocesses directly on the Celery worker. Each dbt model becomes an Airflow task, dispatched to a worker via Redis.
This is the fastest stable mode — no container overhead, no pod startup. The worker handles multiple dbt tasks concurrently (controlled by worker_concurrency).
#
Hybrid pattern: Local + KubernetesPodOperator
For specific heavy or isolated jobs (large dbt full-refresh, data quality checks, ingestion jobs), use KubernetesPodOperator to dispatch to the kpo-pool (spot VMs, scale-to-zero):
# Default: Cosmos local mode on Celery workers
dbt_dag = DbtDag(
execution_config=ExecutionConfig(execution_mode=ExecutionMode.LOCAL),
...
)
# Heavy jobs: KPO on kpo-pool
heavy_task = KubernetesPodOperator(
namespace="airflow-kpo",
node_selector={"pool": "kpo"},
tolerations=[{"key": "workload", "value": "kpo", "effect": "NoSchedule"}],
service_account_name="airflow-kpo",
...
)
This gives you the speed of local execution for most work, with full pod isolation available when needed.
#
Cosmos execution modes considered but deferred
#
Terraform Configuration
Airflow is deployed in environments/{env}-02-runtime/airflow.tf using a helm_release resource with the official Apache Airflow Helm chart 1.20.0 (Airflow 3.2.0).
Key variables:
#
Airflow 3 component architecture
Chart 1.20.0 uses semver gates in its templates. With Airflow >= 3.0.0:
#
Bootstrap sequence
Before the Helm release, a Terraform-managed kubernetes_job_v1 (db_bootstrap) runs:
- Cloud SQL Auth Proxy native sidecar (init container with
restartPolicy: Always) - grants init container -- connects as postgres admin, GRANTs privileges to the IAM user
- migrate init container -- runs
airflow db migrateas the IAM user
This is needed because Cloud SQL IAM users start with zero DB privileges, and the chart's built-in migration hook runs too late. The chart's migrateDatabaseJob is disabled.
#
Service account
Chart 1.20.0 creates per-component KSAs by default (airflow-scheduler, airflow-api-server, etc.), none of which carry the Workload Identity annotation. A single kubernetes_service_account_v1 is created in Terraform with the WI annotation, and all components reference it with serviceAccount = { create = false, name = "airflow" }.
#
Helm values (dev PoC)
executor: CeleryExecutor
defaultAirflowRepository: apache/airflow
defaultAirflowTag: "3.2.0"
# ---------- scheduler ----------
scheduler:
replicas: 1
serviceAccount: { create: false, name: airflow }
waitForMigrations: { enabled: false }
resources:
requests: { cpu: 200m, memory: 512Mi }
limits: { cpu: "1", memory: 1Gi }
startupProbe: { timeoutSeconds: 60, failureThreshold: 20 }
livenessProbe: { timeoutSeconds: 60 }
extraContainers:
- <cloud-sql-proxy --auto-iam-authn --private-ip>
# + GCS FUSE volume/mount/annotation
# ---------- API server (Airflow 3+) ----------
apiServer:
enabled: true
replicas: 1
serviceAccount: { create: false, name: airflow }
waitForMigrations: { enabled: false }
resources:
requests: { cpu: 250m, memory: 512Mi }
limits: { cpu: 500m, memory: 1Gi }
startupProbe: { failureThreshold: 20 }
extraContainers:
- <cloud-sql-proxy --auto-iam-authn --private-ip>
# ---------- DAG processor (Airflow 3+) ----------
dagProcessor:
enabled: true
replicas: 1
serviceAccount: { create: false, name: airflow }
waitForMigrations: { enabled: false }
resources:
requests: { cpu: 150m, memory: 384Mi }
limits: { cpu: 500m, memory: 1Gi }
livenessProbe: { timeoutSeconds: 60 }
extraContainers:
- <cloud-sql-proxy --auto-iam-authn --private-ip>
# + GCS FUSE volume/mount/annotation
# ---------- webserver (Airflow < 3 only) ----------
# Chart skips this template for Airflow 3+.
# Kept for defaultUser consumed by createUserJob.
webserver:
serviceAccount: { create: false, name: airflow }
defaultUser:
enabled: true
# ---------- triggerer ----------
triggerer:
enabled: true
replicas: 1
serviceAccount: { create: false, name: airflow }
waitForMigrations: { enabled: false }
resources:
requests: { cpu: 100m, memory: 256Mi }
limits: { cpu: 250m, memory: 512Mi }
livenessProbe: { timeoutSeconds: 60 }
extraContainers:
- <cloud-sql-proxy --auto-iam-authn --private-ip>
# + GCS FUSE volume/mount/annotation
# ---------- celery workers ----------
workers:
replicas: 1
serviceAccount: { create: false, name: airflow }
waitForMigrations: { enabled: false }
resources:
requests: { cpu: 500m, memory: 1536Mi }
limits: { cpu: "1.5", memory: 3Gi }
livenessProbe: { timeoutSeconds: 60 }
terminationGracePeriodSeconds: 600
extraContainers:
- <cloud-sql-proxy --auto-iam-authn --private-ip>
# + GCS FUSE volume/mount/annotation
# ---------- redis ----------
redis:
enabled: true
serviceAccount: { create: false, name: airflow }
resources:
requests: { cpu: 50m, memory: 64Mi }
limits: { cpu: 100m, memory: 128Mi }
# ---------- metadata database (external Cloud SQL) ----------
postgresql:
enabled: false
data:
metadataSecretName: airflow-metadata-connection
resultBackendSecretName: airflow-result-backend-connection
# ---------- DAG sync (GCS FUSE, not git-sync) ----------
dags:
persistence: { enabled: false }
gitSync: { enabled: false }
# ---------- remote logging ----------
env:
- name: AIRFLOW__LOGGING__REMOTE_LOGGING
value: "True"
- name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
value: "gs://ume-airflow-logs-poc-ume-data/logs"
- name: AIRFLOW__LOGGING__DELETE_LOCAL_LOGS
value: "True"
# ---------- airflow.cfg overrides ----------
config:
core:
parallelism: 16
max_active_tasks_per_dag: 8
max_active_runs_per_dag: 2
celery:
worker_concurrency: 8
scheduler:
min_file_process_interval: 60
# ---------- chart migration job (disabled -- handled by Terraform bootstrap) ----------
migrateDatabaseJob:
enabled: false
# ---------- cleanup (disabled in chart -- standalone Terraform CronJob) ----------
cleanup:
enabled: false
#
Outputs
The runtime stack exports:
airflow_namespace-- Kubernetes namespace.airflow_logs_bucket-- GCS bucket for task execution logs.airflow_dags_bucket-- GCS bucket for DAG sync via GCS FUSE.
#
Custom Image
#
What goes in the image
The custom image extends the official Apache Airflow base image. It adds the Python packages needed for Cosmos + dbt:
System packages (git, build tools) are installed if needed by Python wheels.
#
What does NOT go in the image
- The dbt project itself — synced to GCS bucket via CI (see DAG Sync below).
- DAG files — synced to GCS bucket via CI.
- Secrets or credentials — injected at runtime via Workload Identity or Secret Manager.
#
Image lifecycle
Image ownership lives in ume-data-dags:
ume-data-dags CI (on push to main touching docker/)
│
├── image.yml
│ ├── docker build + push <AR_URL>/airflow:3.2.0-<sha>
│ └── tag is immutable (AR docker_config.immutable_tags = true)
│
└── bot-pr.yml (workflow_run on image.yml success)
└── Uses INFRA_PR_TOKEN (fine-grained PAT) to open a PR on
ume-data-infra bumping airflow_image_tag in
environments/dev-03-runtime/terraform.tfvars.
Merging that PR triggers terraform-apply's
wait-for-image gate → Helm rolls the pods.
Tag format: <airflow-version>-<commit-sha> (e.g., 3.2.0-a1b2c3d).
Immutability: once pushed, a tag is never overwritten. Prod promotion means changing prod-02-runtime/terraform.tfvars to reference the exact same tag validated in dev.
Rollback: revert the airflow_image_tag in tfvars to the previous value and apply.
#
DAG Sync
#
Mechanism: GCS FUSE CSI
DAGs are synced from a GCS bucket to the Airflow pods via the GCS FUSE CSI driver — a native GKE add-on that mounts a GCS bucket as a local filesystem.
ume-data-dags CI (push to main, paths: dags/ or dbt/)
│
└── dag-sync.yml
├── gcloud storage rsync dags/ gs://ume-airflow-dags-poc-ume-data/dags/
└── gcloud storage rsync dbt/ gs://ume-airflow-dags-poc-ume-data/dbt/
│
└── GCS FUSE CSI (mountOptions: implicit-dirs) mounts bucket
at /opt/airflow/dags/ on scheduler, worker, triggerer,
dag-processor. Changes visible near-instantly; dag-processor
refreshes the bundle every 300s.
#
Why GCS FUSE over git-sync
Workload Identity handles all GCS auth — no additional credentials to manage, rotate, or store.
#
GCS FUSE CSI configuration
The GCS FUSE CSI driver is enabled as a GKE cluster add-on in modules/gke-standard/. Pods opt in via annotation and volume spec:
# Pod annotation (enables the FUSE sidecar injector)
gke-gcsfuse/volumes: "true"
# Volume spec
volumes:
- name: dags
csi:
driver: gcsfuse.csi.storage.gke.io
readOnly: true
volumeAttributes:
bucketName: ume-airflow-dags-poc-ume-data
volumeMounts:
- name: dags
mountPath: /opt/airflow/dags/
readOnly: true
#
DAG + dbt project location at runtime
GCS FUSE mounts the bucket root at /opt/airflow/dags/. ume-data-dags's dag-sync.yml workflow rsyncs its dags/ and dbt/ into the bucket, so the filesystem looks like:
/opt/airflow/dags/
├── dags/
│ └── cosmos_dbt_dag.py
└── dbt/
├── dbt_project.yml
├── profiles.yml
└── models/
└── example/
Cosmos references the dbt project at /opt/airflow/dags/dbt.
#
Iteration speed
- Engineer pushes to
main. - CI pipeline runs
gcloud storage rsync(~30 seconds). - GCS FUSE reflects new files (near-instant — bucket is mounted live).
- Scheduler detects updated files (~30-60 seconds).
Total time from push to runnable: ~1-2 minutes.
#
dbt + Cosmos Integration
#
How Cosmos works
Cosmos is an Airflow provider that renders a dbt project as an Airflow task group. Each dbt model becomes an Airflow task, with dependencies preserved.
from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
from cosmos.constants import ExecutionMode
dbt_dag = DbtDag(
project_config=ProjectConfig(
dbt_project_path="/opt/airflow/dags/dbt",
),
profile_config=ProfileConfig(
profile_name="ume",
target_name="dev",
profiles_yml_filepath="/opt/airflow/dags/dbt/profiles.yml",
),
execution_config=ExecutionConfig(
execution_mode=ExecutionMode.LOCAL,
dbt_executable_path="/home/airflow/dbt-venv/bin/dbt",
),
schedule="@daily",
dag_id="dbt_ume",
)
#
dbt profile and credentials
dbt connects to BigQuery using the Airflow service account's identity (Workload Identity). The profiles.yml uses the oauth method:
ume:
target: "{{ ERROR }}"
outputs:
dev:
type: bigquery
method: oauth
project: poc-ume-data
dataset: "{{ ERROR }}"
threads: 4
prod:
type: bigquery
method: oauth
project: ume-data-prod
dataset: "{{ ERROR }}"
threads: 8
No service-account keys. Workload Identity provides the OAuth token.
#
KubernetesPodOperator (KPO)
#
How it works
KPO creates Kubernetes pods directly via the API. KPO tasks run on the dedicated kpo-pool (spot VMs, scale-to-zero). Use for heavy batch jobs, data quality checks, or any task needing full isolation from the Airflow worker.
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
dbt_full_refresh = KubernetesPodOperator(
task_id="dbt_full_refresh",
namespace="airflow-kpo",
image="{{ ERROR }}",
cmds=["dbt", "run", "--full-refresh", "--project-dir", "/dbt"],
service_account_name="airflow-kpo",
node_selector={"pool": "kpo"},
tolerations=[{
"key": "workload",
"operator": "Equal",
"value": "kpo",
"effect": "NoSchedule",
}],
is_delete_operator_pod=True,
)
#
kpo-pool scale-to-zero
- Scheduler dispatches KPO task to worker (via CeleryExecutor).
- Worker creates a pod with toleration for
workload=kpo:NoSchedule+nodeSelector: pool: kpo. - Pod is
Pending— no kpo-pool nodes exist. - Cluster Autoscaler detects pending pod (~30s).
- Spot VM provisioned (~60-90s).
- Pod runs, completes, is cleaned up.
- After ~10 minutes idle, autoscaler removes the empty node.
#
Logging (Hybrid: Cloud Logging + GCS)
Airflow uses a hybrid logging approach — operational logs and task execution logs go to different destinations, each optimized for its use case.
#
Container logs → Cloud Logging (automatic)
GKE automatically ships all container stdout/stderr to Cloud Logging (GCP's equivalent of CloudWatch). This is zero-config — enabled by default on every GKE cluster. These logs cover:
- Scheduler heartbeat and parsing output
- Worker task pickup and execution events
- Webserver access logs
- Pod crashes, OOM events, restarts
Cloud Logging provides searchable, indexed logs with alerting — ideal for operational observability.
#
Task execution logs → GCS (configured)
Airflow's built-in remote_logging feature ships task execution logs (the structured output from each DAG task run) to a GCS bucket. The Airflow UI reads task logs directly from GCS.
env:
- name: AIRFLOW__LOGGING__REMOTE_LOGGING
value: "True"
- name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
value: "gs://ume-airflow-logs-poc-ume-data/logs"
- name: AIRFLOW__LOGGING__DELETE_LOCAL_LOGS
value: "True"
GCS is cheaper than Cloud Logging for retention and Airflow natively reads from it — no custom log handler needed.
#
GCS log bucket
Created via modules/gcs-bucket/ in dev-02-runtime/buckets.tf:
- Bucket:
ume-airflow-logs-poc-ume-data ume-airflowhasroles/storage.objectAdmin(project-wide for PoC; scope to this bucket as a hardening task)- Lifecycle rule: delete objects older than 90 days (configurable)
#
Log cleanup sidecar
The Helm chart includes a log cleanup sidecar on the scheduler pod. With remote logging + DELETE_LOCAL_LOGS=True, this is a safety net:
scheduler:
logCleanup:
enabled: true
retentionMinutes: 1440 # 1 day (local copies only; GCS has its own lifecycle)
#
Metadata Database Maintenance
#
Growth drivers
The Airflow metadata database grows continuously. Fastest-growing tables:
Without cleanup, the database grows indefinitely and scheduler performance degrades.
#
Automated cleanup
A standalone kubernetes_cron_job_v1 Terraform resource runs airflow db clean weekly. The Helm chart's built-in cleanup section does not support sidecar injection (additionalProperties: false in its JSON schema), so the Cloud SQL Auth Proxy cannot be added there. The standalone CronJob uses a K8s 1.28+ native sidecar (init container with restartPolicy: Always) to provide database connectivity.
# In the environment's airflow.tf or terraform.tfvars:
cleanup_enabled = true # default: false
cleanup_schedule = "0 3 * * 0" # Sunday 3 AM UTC
cleanup_retention_days = 90
This retains 90 days of metadata. Adjust based on backfill needs — if you use depends_on_past=True, ensure retention covers the lookback window.
#
Manual cleanup
For one-off cleanups or targeted table cleanup:
kubectl exec -it deploy/airflow-scheduler -n airflow -- \
airflow db clean --clean-before-timestamp "2026-01-01" --only-tables task_instance,log,xcom
Always backup the database before manual cleanup.
#
Monitoring and Alerting
#
Scaling signals: when to upgrade from e2-standard-2
The default-pool starts with a single e2-standard-2 (1930m allocatable CPU, ~6.1 GiB RAM). Monitor these signals to know when to scale:
Upgrade path: e2-standard-2 ($49/mo) → e2-standard-4 ($98/mo, 3920m CPU / 13.3 GiB). Alternatively, keep e2-standard-2 and set min_nodes=2 ($98/mo, 3860m aggregate CPU but better fault tolerance).
#
Cloud SQL alerts
#
Airflow alerts
#
GKE / infrastructure alerts
#
Recommended dashboard
Create a Cloud Monitoring dashboard with:
- Scheduler health: heartbeat interval, task queue depth, DAG parsing time
- Worker load: CPU/memory utilization, task concurrency, queue wait time
- Database: disk usage, CPU, memory, active connections
- Cluster: node count per pool, pod status, autoscaler events
#
Workload Identity
Separate namespaces and SAs enforce least privilege — a compromised KPO container cannot access Airflow metadata or Cloud SQL.
#
API Server Authentication (IAP at the GCLB)
External access to the Airflow API server is gated by Identity-Aware Proxy at the Google Cloud Load Balancer, one layer in front of Airflow. Airflow 3 keeps its default SimpleAuthManager with an admin user created by the Helm chart's createUserJob; users reach the app only after IAP validates their Google identity. Implemented in Story 4c.
#
Topology
Browser
│ TLS handshake with *.umedev.marpont.es
▼
GCLB (shared static IP ume-data-dev-ingress-ip, wildcard cert from Certificate Manager)
│ Gateway listens on :80 (redirect → :443) and :443 (HTTPS)
▼
HTTPRoute (airflow namespace) — hostname airflow.umedev.marpont.es → airflow-api-server:8080
│
▼
IAP gate (attached to the backend service via GCPBackendPolicy)
│ Google OIDC sign-in + check against roles/iap.httpsResourceAccessor
▼
Service airflow-api-server → api-server pod → Airflow SimpleAuthManager
#
Why IAP over Airflow-native OIDC
#
Gateway API (not classic Ingress)
The shared Gateway is a gateway.networking.k8s.io/v1 Gateway with gatewayClassName: gke-l7-global-external-managed. One Gateway fronts every service in the environment — Airflow today, DataHub in Phase 2 — on a single static IP and a single wildcard TLS cert. Each app attaches its own HTTPRoute for its hostname.
Gateway ownership:
- Shared Gateway + redirect HTTPRoute live in
environments/{env}-02-k8s-base/gateway.tf. - Per-app HTTPRoute lives inside the app's module (for Airflow,
modules/airflow-helm/httproute.tf).
Cross-namespace attachment is allowed without ReferenceGrant by setting allowedRoutes.namespaces.from = All on each listener. Backend Service references stay intra-namespace (HTTPRoute and Service both in airflow).
#
IAP wiring
Per-service IAP is provisioned by modules/iap-oauth/:
google_iap_clientcreates an OAuth 2.0 client under the project-level IAP brand (passed in viavar.iap_brand_name).kubernetes_secret_v1storesclient_id+client_secretin the Service namespace — keys must match exactly becauseGCPBackendPolicy.spec.default.iap.oauth2ClientSecret.nameexpects that shape.kubernetes_manifestGCPBackendPolicy(networking.gke.io/v1) attaches IAP to the Service viatargetRef. The GKE Gateway controller reads this and enables IAP on the generated backend service.google_project_iam_memberbindings grantroles/iap.httpsResourceAccessorto the allow-listed principals unconditionally. IAM conditions on the project-level grant do not propagate to IAP's authorization path for Gateway-API backends (IAP reads the IAP-resource-level policy on the backend, not project IAM with conditions). Scoping is done via the allow-list — pick users/groups tightly — not via IAM conditions.
The module accepts three allow-list variables — iap_allowed_domains, iap_allowed_groups, iap_allowed_users — and takes the UNION. Use individual users for tight scoping during the PoC. When a second IAP-protected backend exists with different access requirements, switch to google_iap_web_backend_service_iam_member scoped per service.
#
Prerequisite — OAuth consent screen is manual
google_iap_brand cannot create the OAuth consent screen via API for projects outside a Workspace org, and even for in-org projects the IAP OAuth Admin API is being phased out. The brand must be created once in the GCP Console. See the header comment in environments/{env}-03-runtime/iap.tf for the step-by-step runbook. After creation:
gcloud iap oauth-brands list --project=<project-id> --format='value(name)'
Paste the resulting projects/<project_number>/brands/<brand_id> into iap_brand_name in the runtime stack's tfvars.
#
Known provider quirks
- IAM conditions on IAP bindings are inert for Gateway-API backends. A project-level
google_project_iam_memberonroles/iap.httpsResourceAccessorwith conditionresource.type == "iap.googleapis.com/WebBackendService"applies cleanly and shows ingcloud projects get-iam-policy, but IAP rejects sign-in with "You don't have access". IAP for Gateway API reads the IAP-resource-level policy (gcloud iap web get-iam-policy --resource-type=backend-services --service=…), not project IAM with conditions. Use unconditional bindings orgoogle_iap_web_backend_service_iam_memberper backend. - Conditional IAM member +
domain:members crash on create. With an IAM condition attached,google_project_iam_membercreations fordomain:members hit a google-provider rollback bug ("Provider produced inconsistent result after apply: Root object was present, but now absent").user:members don't hit this. Combined with the previous point, the cleanest path is unconditional + explicit per-user allow-list. google_iap_brandcan't be created for non-Workspace projects (HTTP 400) and the IAP OAuth Admin API is being phased out. Create the OAuth consent screen manually in Console and pass the brand name in.- The IAP brand is a one-way door — cannot be deleted via API;
terraform destroyrequiresterraform state rm google_iap_brand.projectfirst.
#
Airflow-side auth (post-IAP)
Behind IAP, Airflow 3 runs SimpleAuthManager with [core] simple_auth_manager_all_admins = true — every request is treated as admin with no login prompt. IAP already authenticated the user; a second password would add no security and confuse users. The chart's createUserJob is auto-disabled in that mode because airflow users create uses FAB's security manager (AirflowSecurityManagerV2.find_role) which isn't configured under SimpleAuthManager and crashes the Helm hook.
Two Airflow configs must be set as a pair:
The airflow-helm module wires both together — airflow_config.simple_auth_manager_all_admins = true on the module call flips both internally, and also disables createUserJob.
Port-forward remains a break-glass path:
kubectl port-forward svc/airflow-api-server 8080:8080 -n airflow
# lands straight on the UI — SimpleAuthManager trusts every request
Port-forward is already gated upstream by GKE IAM (you need container.clusters.get + pod exec/port-forward perms to run it), so skipping the Airflow-side login doesn't widen the blast radius.
#
Cloud SQL (Airflow Metadata Database)
A Cloud SQL PostgreSQL instance serves as the Airflow metadata store.
Phase 2 shared instance strategy: When DataHub arrives, evaluate whether to create a second logical database (datahub) on this instance (cheaper) or a separate instance (better isolation).