# Introduction

This section establishes the scope and assumptions behind the infrastructure wave-1 implementation. It defines what we are building, what we are not building, and the constraints that shape our decisions.

# Purpose

The UME data platform, as described in the Architecture and Tools section, envisions a governed, self-service analytics environment. Before any of that vision materializes, we need the infrastructure underneath it. This documentation covers:

How the Terraform repository is organized (layers, environments, modules)
How CI/CD pipelines provision and maintain infrastructure
How Airflow on GKE Standard is deployed and wired to DAGs and dbt
How DataHub is deployed on GKE with its backing services
How we observe, alert on, and operate all of the above

# Wave-1 Scope

Wave-1 is the first implementation cycle. It delivers a working development environment that validates tooling choices, interoperability, and developer experience before we commit to production.

# In scope

Airflow on GKE Standard - Deployed via the official Apache Airflow Helm chart with CeleryExecutor. Custom image containing astronomer-cosmos, dbt-core, and dbt-bigquery. DAGs delivered via git-sync sidecar. At least one dbt model running end-to-end via Cosmos.
DataHub - Self-hosted on GKE Standard. Backed by Cloud SQL (PostgreSQL), self-hosted Kafka (Strimzi), and self-hosted OpenSearch. Google OIDC for authentication. Ingestion recipes for BigQuery metadata, Airflow, and dbt.
GKE Standard - Zonal cluster for dev PoC (regional for prod). Hosts Airflow (Phase 1) and DataHub, Kafka, OpenSearch (Phase 2). Zero-downtime node management.
CI/CD - GitHub Actions with Workload Identity Federation for plan/apply/drift workflows against the Terraform repository.
Observability - Google Managed Prometheus and Cloud Operations for metrics, dashboards, and alerts.
Cost groundwork - Mandatory labels on all resources. Project-level budget alerts.

# Out of scope (wave-1)

Production environment - Prod stacks mirror dev but are only provisioned after dev is proven. The Terraform code is multi-project ready; only tfvars change.
BigQuery datasets and GCS landing buckets - These are data engineering concerns managed outside the infrastructure repository (via dbt, manual creation, or a future data-platform layer).
Billing export to BigQuery / Looker dashboards - Documented as a future enhancement; labels are in place to support it.
Managed Kafka migration - Documented as an upgrade path when pricing becomes viable.
Self-hosted Grafana - GMP + Cloud Ops covers wave-1 needs.
SIEM, advanced compliance, or regulated-environment controls - The platform is lightweight for now.
GCP project creation - Projects are provisioned externally; Terraform assumes they exist.
GCP Folders or Organization-level IAM - The current org has a flat project list. Terraform does not touch org-level resources.

# Assumptions

Single GCP project for dev: poc-ume-data. The user has Owner-level access. All dev resources (GKE, Cloud SQL, networking, Artifact Registry, state bucket) coexist here.
Multi-project target: Production will use dedicated projects per concern (shared-services, platform, data). Terraform modules accept a project-ID map from tfvars, so migration requires no code changes.
Projects are pre-existing inputs: Terraform never creates GCP projects. The 00-bootstrap layer assumes the target project already exists.
No GCP Folders: The organization uses a flat list of projects under a single org. No folder-level IAM or hierarchy.
Existing DAGs repository: There is an existing repository containing Airflow DAGs and a large dbt project. This will be expanded to include the custom Airflow image Dockerfile and CI, as well as DataHub ingestion recipes. During Phase 1, DAG/dbt/image work lives in resources/ within ume-data-infra and will be ported to the dedicated repo later.
GitHub as source control: The infrastructure repo lives at github.com/1edata/ume-data-infra (will move orgs later).
Two environments only: dev and prod. No staging, no QA. Dev validates; prod mirrors.

# Environments

Environment	GCP Project(s)	Purpose
dev	`poc-ume-data` (single project)	Validation. All components deployed here first.
prod	Multiple projects (TBD, created externally)	Production. Brought up only after dev is proven.

# Relationship to Other Documentation

Architecture and Tools - Defines the what and why: tool selection rationale, data flow, governance goals. This infrastructure section defines the how.
ETL - Describes transformation patterns, orchestration capabilities, and SDLC practices. Infrastructure provides the Airflow deployment, DAG sync mechanism, and image pipeline that ETL relies on.
Data Catalog - Describes DataHub's role, integrations, and governance workflows. Infrastructure provides the deployed DataHub instance, its backing services, and operational runbooks.