#
Introduction
This section establishes the scope and assumptions behind the infrastructure wave-1 implementation. It defines what we are building, what we are not building, and the constraints that shape our decisions.
#
Purpose
The UME data platform, as described in the Architecture and Tools section, envisions a governed, self-service analytics environment. Before any of that vision materializes, we need the infrastructure underneath it. This documentation covers:
- How the Terraform repository is organized (layers, environments, modules)
- How CI/CD pipelines provision and maintain infrastructure
- How Airflow on GKE Standard is deployed and wired to DAGs and dbt
- How DataHub is deployed on GKE with its backing services
- How we observe, alert on, and operate all of the above
#
Wave-1 Scope
Wave-1 is the first implementation cycle. It delivers a working development environment that validates tooling choices, interoperability, and developer experience before we commit to production.
#
In scope
- Airflow on GKE Standard - Deployed via the official Apache Airflow Helm chart with CeleryExecutor. Custom image containing
astronomer-cosmos,dbt-core, anddbt-bigquery. DAGs delivered via git-sync sidecar. At least one dbt model running end-to-end via Cosmos. - DataHub - Self-hosted on GKE Standard. Backed by Cloud SQL (PostgreSQL), self-hosted Kafka (Strimzi), and self-hosted OpenSearch. Google OIDC for authentication. Ingestion recipes for BigQuery metadata, Airflow, and dbt.
- GKE Standard - Zonal cluster for dev PoC (regional for prod). Hosts Airflow (Phase 1) and DataHub, Kafka, OpenSearch (Phase 2). Zero-downtime node management.
- CI/CD - GitHub Actions with Workload Identity Federation for plan/apply/drift workflows against the Terraform repository.
- Observability - Google Managed Prometheus and Cloud Operations for metrics, dashboards, and alerts.
- Cost groundwork - Mandatory labels on all resources. Project-level budget alerts.
#
Out of scope (wave-1)
- Production environment - Prod stacks mirror dev but are only provisioned after dev is proven. The Terraform code is multi-project ready; only
tfvarschange. - BigQuery datasets and GCS landing buckets - These are data engineering concerns managed outside the infrastructure repository (via dbt, manual creation, or a future data-platform layer).
- Billing export to BigQuery / Looker dashboards - Documented as a future enhancement; labels are in place to support it.
- Managed Kafka migration - Documented as an upgrade path when pricing becomes viable.
- Self-hosted Grafana - GMP + Cloud Ops covers wave-1 needs.
- SIEM, advanced compliance, or regulated-environment controls - The platform is lightweight for now.
- GCP project creation - Projects are provisioned externally; Terraform assumes they exist.
- GCP Folders or Organization-level IAM - The current org has a flat project list. Terraform does not touch org-level resources.
#
Assumptions
- Single GCP project for dev:
poc-ume-data. The user has Owner-level access. All dev resources (GKE, Cloud SQL, networking, Artifact Registry, state bucket) coexist here. - Multi-project target: Production will use dedicated projects per concern (shared-services, platform, data). Terraform modules accept a project-ID map from
tfvars, so migration requires no code changes. - Projects are pre-existing inputs: Terraform never creates GCP projects. The
00-bootstraplayer assumes the target project already exists. - No GCP Folders: The organization uses a flat list of projects under a single org. No folder-level IAM or hierarchy.
- Existing DAGs repository: There is an existing repository containing Airflow DAGs and a large dbt project. This will be expanded to include the custom Airflow image Dockerfile and CI, as well as DataHub ingestion recipes. During Phase 1, DAG/dbt/image work lives in
resources/withinume-data-infraand will be ported to the dedicated repo later. - GitHub as source control: The infrastructure repo lives at
github.com/1edata/ume-data-infra(will move orgs later). - Two environments only:
devandprod. No staging, no QA. Dev validates; prod mirrors.
#
Environments
#
Relationship to Other Documentation
- Architecture and Tools - Defines the what and why: tool selection rationale, data flow, governance goals. This infrastructure section defines the how.
- ETL - Describes transformation patterns, orchestration capabilities, and SDLC practices. Infrastructure provides the Airflow deployment, DAG sync mechanism, and image pipeline that ETL relies on.
- Data Catalog - Describes DataHub's role, integrations, and governance workflows. Infrastructure provides the deployed DataHub instance, its backing services, and operational runbooks.