# Locomotive: silver layer + BQ governance

This is the data-layer initiative running on top of wave-1 infrastructure. Wave-1 ships the platform (Airflow on GKE, DataHub Phase 2 next). The locomotive ships the canonical surface of data on top of it.

The locomotive locks the BigQuery surface so writes only land through a reviewed PR, builds 13 curated silver tables as the new source of truth for downstream analytics, and routes future changes through a dbt project living in ume-data-engineering with CODEOWNERS and AI review.

Bad business rules in early silver tables are acceptable. They get fixed through the normal model-update PR flow. The point is to ship the canonical surface and the governance pattern, not to land perfect modelling on day one.

# Initial brief (organized and corrected)

This is the original brief, with corrections applied.

# Goal

Move fast on a data-layer reset that begins paying off in weeks. Pace is measured by silver tables shipped per week and legacy consumers migrated.

# Source state

The current CDC pipeline outputs raw tables in BigQuery from the production Postgres at GCP.
Today, anyone with BQ access can create tables and views directly on top of these raw CDC tables, with no governance.
This produced fragmentation, abandoned tables, divergent business rules.

# What we are building (corrected)

A new GCP project (immediate, not deferred) for the silver layer.
13 curated silver tables built directly from raw CDC. The list is locked (below). These are claimed to cover roughly 80% of analytical use cases.
All future downstream tables and views must source only from silver, never from raw CDC.

Terminology note: at UME, "silver" = these 13 curated tables. Bronze = raw CDC. This is non-standard medallion mapping. Do not assume the usual meaning.

# What we are doing to BigQuery

Cut BQ data write access. Users can no longer create views or tables directly.
Pulumi was the original idea; corrected to Terraform (the entire infra repo is Terraform; introducing a second IaC chain has no upside).
Control dataset creation, user and group policies through Terraform.
Bronze (raw CDC) reads are eventually disabled once consumers are migrated.

# Developer experience

Dev-X happens through a dbt monorepo.
We considered dbt Cloud. Locked decision: stay on Cosmos + dbt-core (already deployed on GKE Airflow). Reconsider dbt Cloud after the first 5 silver tables are live with a measured cost/velocity baseline.
Reference DAGs repo: ume-data-dags (template only; will be retired as Cosmos points at the new dbt project).
The dbt monorepo lives in ume-data-engineering on a long-lived migration branch (cut from dev). Brand-new dbt project at app/datalake/transformations/.

# Collaboration and PR gates

Repository skills for automated build of new implementations.
AI bots that review PRs against those skills.
Static analysis (SQLFluff, dbt-checkpoint, schema lint).
PR gates: CODEOWNERS by company area + at least one data-platform reviewer at root.

# Ingestion blueprints

Beyond CDC, we need easy ways to ingest jsonl/json logs, csv, FTP.
Same pattern as silver: PR gates, AI bot, skills.
Home: ume-data-engineering/ingestion/.

# Observability and governance (later, but designed now)

Separate GCP project for the silver layer (immediate).
Metadata-driven observability: what was ingested, when, what failed. Alerting to Slack.
BigQuery usage datamart with users, tables, scanned bytes, cost, last-time-used. Powers cost dashboards and the lifecycle policy.
Detect things created outside the expected path (audit-log alerts on raw CREATE TABLE).
Telemetry stack: Metabase only for dashboards and alerts. Looker Studio dropped to keep one tool. Metrics layer = dbt meta__* models.

# Domain ownership

From silver onwards, domain tables are owned by company areas, always committing to the dbt monorepo. Silver itself is owned by the data platform team.
We are not focusing atendimento-first; the silver tables themselves drive sequencing.

# Lifecycle

Tables not used or updated get a lifecycle policy (cold storage or deletion).
If a table is still referenced in dbt, an automated PR to deprecate it is the trigger.
Designed in the locomotive, automated in Wave 2.

# New Metabase instance

Built like ume-data-infra (Terraform + Helm + dependabot) so it can be upgraded without downtime.
Migration of golden-standard dashboards follows silver readiness, not a domain-first plan.

# Governance via DataHub

DataHub scanning of BigQuery, Airflow, dbt manifests is in scope for the locomotive.
Similarity scoring + PR-bot to prevent fragmentation (too-similar models, abandoned tables) is Wave 2, with groundwork laid now (DataHub metadata pipeline + simple column-overlap heuristic in PR review).

# Silver list (locked, 13 tables)

First two to attack: contracts and installments.

#	Table	Notes
1	`source_product`	Regra global
2	`contracts`	First target
3	`installments`	First target
4	`borrowers`
5	`renegotiations`
6	`renegotiated_installments`
7	`invoices`
8	`installments_invoices`
9	`application`
10	`payments`
11	`payment_invoices`
12	`transaction_engine`
13	`application_credit_evaluations`

# Locked decisions

Decision	Choice
IaC	Terraform. New silver project ships as a new env stack inside `ume-data-infra`. No Pulumi.
dbt repo	`ume-data-engineering`, long-lived `migration` branch cut from `dev`, dbt project at `app/datalake/transformations/`.
dbt runtime	Cosmos + dbt-core on the existing GKE Airflow. dbt Cloud reconsidered after the first 5 silver tables ship.
Telemetry	Metabase only for dashboards and alerts. Metrics = dbt `meta__*` models in BigQuery.
Data-quality alerts	dbt tests + Cosmos Slack callbacks. Elementary deferred.
DataHub in scope	Yes, but scanning only (BQ + Airflow + dbt manifests). Similarity scoring is Wave 2.
Ingestion blueprints	`ume-data-engineering/ingestion/` (jsonl, csv, ftp).
Bad business rules	Acceptable in early silver tables. Fixed via normal PR flow.
Sequencing	Silver tables drive everything. No atendimento-first.

# Earliest visible milestones

Stakeholder-visible news, in the rough order each can land. Each is a short post in #data-platform. Cadence is the morale signal.

Day 1: silver list pinned (this doc); metrification dashboard wireframe shared.
First silver project terraform plan clean in CI.
First silver model materialized in BigQuery (target: contracts) via Cosmos against silver-dev.
First Metabase dashboard reading from meta__bq_usage: who scanned how much yesterday, who has shadow tables.
First sample PR through every gate in ume-data-engineering: dbt parse, schema lint, AI similarity check, CODEOWNERS approval.
First Slack alert fires from Metabase on a forced threshold breach.
DataHub shows lineage from a silver model back to its raw CDC source.
First sandbox dataset created for a willing pilot team, with TTL working.
First non-CDC source onboarded through an ingestion blueprint.
First legacy consumer migrated off raw CDC onto a silver table.

# Workstreams

Each workstream is sized to be picked up by one specialist. Specialist names match files in docs/infrastructure/agents/ where applicable; new ones are flagged. Exit criteria are binary.

# WS-1: Silver GCP project (Terraform)

Specialist: infra-terraform (existing).

Scope: new GCP project (ume-data-silver-prod and -dev), VPC reuse, BigQuery datasets per silver group, service accounts for Airflow-writer and Metabase-reader, IAM groups by domain. New env stacks: environments/silver-01-base, environments/silver-02-bigquery. Reuse vpc, iap-oauth; add bq-dataset module if missing.

Exit criteria:

terraform apply from CI provisions silver-dev end-to-end with no manual steps.
Airflow's existing service account writes into silver-dev via Workload Identity.
bq query from a non-allowlisted user against silver-dev returns a permission error.
Same stack applied to silver-prod (manual approval gate in CI).

# WS-2: BigQuery surface lockdown

Specialist: bq-governance (new).

Scope:

Inventory writes via INFORMATION_SCHEMA.JOBS_BY_PROJECT over 90 days; produce a register of shadow tables, owners, access paths.
Communicate the timeline; create per-team sandbox_<team> datasets with 30-day partition expiration.
Stage revoke of dataEditor on legacy datasets in waves; gate dataset creation behind Terraform.
Audit-log alert in Cloud Monitoring on any unexpected CREATE TABLE outside silver and sandboxes.

Exit criteria:

Shadow-table register published with ≥90% of writes attributed to a known user/team.
All sandboxes provisioned with TTL verified (a test row vanishes after expiration).
dataEditor removed from at least one legacy dataset without breaking any consumer (verified via 7-day query log).
Audit-log alert fires on a forced test CREATE TABLE and posts to the platform Slack channel.

# WS-3: New dbt project in `ume-data-engineering`

Specialist: dbt-silver (new; or evolve composer-dags into airflow-dags).

Scope: cut migration branch on ume-data-engineering from dev. Brand-new dbt project at app/datalake/transformations/. models/silver/, models/meta/, sources/cdc.yml pointing at raw CDC. CI: dbt parse, dbt build against silver-dev, schema diff comment on PR. Pre-commit: SQLFluff, dbt-checkpoint. Cosmos config in ume-data-dags (or its successor) points at this folder. First two silver models built: contracts and installments.

Exit criteria:

migration branch exists with branch protection (required reviews, required status checks).
dbt parse and dbt build both pass in CI on a PR.
silver.contracts and silver.installments materialise in ume-data-silver-dev via a Cosmos run triggered from Airflow.
Manifest + run-results uploaded to GCS after every run (input for meta__dbt_runs).

# WS-4: Dev-X and PR gates

Specialist: dev-x (new).

Scope: CODEOWNERS by domain folder + a data-platform reviewer at the repo root. Repo skills in .claude/skills/ for "add silver model", "add meta dashboard", "add ingestion blueprint". AI PR-review bot wired with concrete checks: column-overlap heuristic against existing silver models (similarity-prevention groundwork), schema lint, test-coverage threshold, manifest diff summary. Branch protection enforces required checks + required reviewers.

Exit criteria:

CODEOWNERS file in migration branch covers every silver folder; a push without an approval is blocked.
Test PR with a deliberately too-similar model triggers the AI bot's similarity warning.
Test PR with a missing dbt test fails the coverage check.
At least one repo skill produces a working scaffold ("add silver model" generates a PR-ready file).

# WS-5: Ingestion blueprints

Specialist: data-ingestion (new).

Scope: blueprints for jsonl/json logs, csv, FTP. Each blueprint: a DAG factory, schema contract, landing-zone convention, smoke tests, observability hook (writes to meta__ingestion_runs). Same CODEOWNERS + AI review pattern as WS-4. Home: ume-data-engineering/ingestion/.

Exit criteria:

Three blueprints (jsonl, csv, ftp) committed with a README and a smoke test each.
One real non-CDC source onboarded through a blueprint, end-to-end into a landing dataset.
That source emits a row to meta__ingestion_runs after each run.

# WS-6: Telemetry datamart + Metabase observability

Specialist: dbt-silver for the meta models, infra-terraform for the Metabase wiring.

Scope: dbt meta models in the silver project:

meta__bq_usage from INFORMATION_SCHEMA.JOBS_BY_PROJECT (user, table, scanned bytes, cost, last_run).
meta__table_lifecycle (last_queried, last_modified, query_count_30d, owner). Feeds future lifecycle PRs.
meta__ingestion_runs (DAG, table, rows, duration, status, error_class).
meta__dbt_runs (manifest + run_results loaded by a post-run hook).

Metabase: collection per area, alerts on freshness lag, ingestion failures, cost spikes, "table not queried in N days". Alerts to Slack via Metabase's native integration.

Exit criteria:

All four meta__* models build daily and have at least one dbt test each.
Metabase has a "Cost yesterday" question that loads in <5s.
At least three Metabase alerts wired to Slack (one freshness, one cost, one ingestion failure).
A forced threshold breach triggers a Slack message in <10 min.

# WS-7: New Metabase instance

Specialist: infra-terraform.

Scope: adapt the airflow-helm module pattern in ume-data-infra for Metabase. Cloud SQL Postgres backend (HA), IAP fronting, immutable image tags, dependabot bumping chart and image versions through PR. Workload Identity for the read-only BQ service account.

Exit criteria:

metabase.<domain> reachable via IAP for the seed allowlist.
Dependabot opens a successful image-bump PR within the first 2 weeks of deploy.
Metabase reads silver and meta__* tables; cannot write anywhere.
Restart of the Cloud SQL Postgres causes <2 min of Metabase downtime.

# WS-8: Legacy consumer migration

Specialist: bq-governance + domain owners.

Scope: catalogue every table/view sourcing from raw CDC; assign each a target silver source; produce a migration runbook + sunset date per consumer. Tracked in the same metrification dashboard as silver delivery.

Exit criteria:

Register of legacy consumers complete; each row tagged with target silver model or "to sunset".
At least 5 consumers repointed at silver and verified (row counts within tolerance vs raw-CDC version).
Sunset dates communicated to all owners.
BQ lockdown (WS-2) gates moved forward only after all consumers in a wave are repointed or sunset.

# WS-9: DataHub scanning of silver + sources

Specialist: datahub-platform (existing).

Scope: ship DataHub Phase 2 per current roadmap. Wire ingestion recipes for BigQuery (silver project + raw CDC), Airflow DAGs, dbt manifests from ume-data-engineering. Tag silver models as the canonical layer. Lineage between raw CDC and silver visible. No similarity-detection logic yet; that is Wave 2.

Exit criteria:

DataHub running in dev with all three ingestion recipes scheduled in Airflow.
Every shipped silver model appears in DataHub with its dbt description, tags, and column-level lineage to raw CDC.
Search for a raw CDC table shows downstream silver models.
DataHub UI reachable via IAP for the seed allowlist.

# WS-10: Lifecycle policy (design now, automate Wave 2)

Specialist: dev-x.

Scope: write the rule set (e.g. table not queried in 90 days + not referenced in dbt manifest → deprecation candidate). The read-only report builds in WS-6. Automation (auto-PR to deprecate from ume-data-engineering) is Wave 2.

Exit criteria:

Lifecycle rules written in docs/infrastructure/lifecycle.md and accepted in a PR.
Weekly Metabase report listing deprecation candidates ships.
First review meeting held; at least one candidate confirmed for deprecation.

# Metrification

One dashboard fed by the meta__* models. Four counters tracked weekly.

Metric	Source	Why it matters
Silver tables live	`meta__dbt_runs` filtered to silver schema	Pace of the locomotive
Legacy consumers migrated	manual register in dbt repo + `meta__bq_usage`	Progress toward lockdown
BQ scanned cost (silver vs legacy)	`meta__bq_usage`	Budget signal
Failed silver runs (7d)	`meta__dbt_runs`	Reliability signal

Targets are intentionally absent from the main task list. The curve is the conversation.

# Wave 2 (acknowledged, not scoped here)

DataHub similarity scoring + PR-bot that flags new silver models with high column overlap to existing ones.
Lifecycle automation: auto-PR to ume-data-engineering to deprecate cold tables.
dbt Cloud re-evaluation with concrete cost/velocity baseline.
Per-domain Metabase migrations off the legacy instance.

# Critical files and paths

ume-data-infra/ (Terraform)

environments/silver-01-base/ (new)
environments/silver-02-bigquery/ (new)
environments/dev-04-metabase/ (new)
modules/bq-dataset/ (new, if not present)

ume-data-engineering/ (new migration branch off dev)

app/datalake/transformations/ (brand-new dbt project)
app/datalake/transformations/dbt_project.yml, profiles.yml
app/datalake/transformations/models/silver/, models/meta/, sources/cdc.yml
ingestion/templates/{cdc,jsonl,csv,ftp}/ (WS-5)
.github/workflows/{pr-ci.yml,build.yml,manifest-upload.yml}
.github/CODEOWNERS
.claude/skills/ (add-silver-model, add-meta-dashboard, add-ingestion-blueprint)

ume-data-dags/

Stays as the reference template; Cosmos config evolves to point at ume-data-engineering/app/datalake/transformations.

# Verification (end-to-end)

terraform plan clean across silver-01-base, silver-02-bigquery, dev-04-metabase.
One sample PR on the migration branch triggers all gates: dbt parse, schema lint, AI similarity check, manifest diff, CODEOWNERS approval.
Cosmos in dev runs against ume-data-silver-dev and produces silver.contracts and silver.installments.
meta__bq_usage populated; Metabase question shows yesterday's spend; alert fires to Slack on a forced threshold breach.
DataHub shows the silver model with column lineage back to the raw CDC source.
BQ audit-log alert fires on a manual CREATE TABLE outside the silver project.

# Duration estimate

The main task list is not time-locked. Rough envelope:

Best case: ~6 weeks to all exit criteria green.
Realistic: ~10 weeks. WS-2 drags because of communication and consumer migration; WS-9 depends on DataHub Phase 2 stability.
Worst case: ~16 weeks. DataHub Phase 2 instability, Cloud SQL HA surprises in WS-7, or stakeholder pushback on the lockdown timeline.

Even on the worst-case path, the first 4–5 visible milestones land within the first month.

# Tasks (flat list, prioritized)

Each task is sized to be picked up independently. Exit criteria are restated for clarity.

# Foundation (parallelizable, do these first)

T-01 — Clone ume-data-engineering locally under ~/git/. Exit: repo present, dev branch checked out.
T-02 — Cut migration branch off dev on ume-data-engineering and push with branch protection (required reviews, required status checks). Exit: branch live on remote with protection rules.
T-03 — Scaffold environments/silver-01-base/ in ume-data-infra (project, VPC reuse, IAM groups). Exit: terraform plan succeeds locally.
T-04 — Scaffold environments/silver-02-bigquery/ (datasets per silver group, SAs for Airflow-writer + Metabase-reader). Exit: terraform plan succeeds locally.
T-05 — Add modules/bq-dataset/ if not already present. Exit: module created, used by silver-02-bigquery.
T-06 — Open infra PR for silver-01-base + silver-02-bigquery. Exit: PR green, CI plan output posted as comment.
T-07 — terraform apply silver-dev from CI (manual approval). Exit: project + datasets exist; Airflow SA writes a test row via Workload Identity.

# dbt project bootstrap

T-10 — Create app/datalake/transformations/ skeleton on migration branch: dbt_project.yml, profiles.yml, models/silver/, models/meta/, sources/cdc.yml. Exit: dbt parse passes locally.
T-11 — Wire CI: dbt parse on PR, dbt build against silver-dev, schema diff comment, SQLFluff + dbt-checkpoint pre-commit. Exit: a sample PR runs all checks green.
T-12 — Build silver.contracts model + dbt tests (not_null, unique on PK). Bad business rules acceptable; doc TODO inline. Exit: model materialises in silver-dev via local dbt run.
T-13 — Build silver.installments model + dbt tests. Exit: model materialises in silver-dev.
T-14 — Wire Cosmos in ume-data-dags (or successor) to point at app/datalake/transformations. Exit: an Airflow DAG run produces both silver models in silver-dev.
T-15 — Manifest + run-results upload to GCS as a post-run Cosmos hook. Exit: artifacts present in GCS bucket after each run.

# Governance + dev-X

T-20 — Add .github/CODEOWNERS covering each silver folder + a data-platform reviewer at root. Exit: a push without approval is blocked on migration.
T-21 — Add .claude/skills/add-silver-model skill that scaffolds a new model + sources + tests. Exit: invoking the skill produces a PR-ready file set.
T-22 — Wire AI PR-review bot: column-overlap similarity check against existing silver models, schema lint, test-coverage threshold, manifest diff summary. Exit: a deliberately-similar test PR triggers the bot's warning.

# BQ lockdown (WS-2)

T-30 — Build the shadow-table inventory query against INFORMATION_SCHEMA.JOBS_BY_PROJECT (90-day window). Publish results as a register doc. Exit: ≥90% of writes attributed to a known user/team.
T-31 — Provision per-team sandbox_<team> datasets via Terraform with 30-day partition expiration. Exit: a test row in a sandbox vanishes after expiration.
T-32 — Audit-log alert in Cloud Monitoring on CREATE TABLE outside silver + sandboxes; route to Slack. Exit: a forced test fires the alert.
T-33 — Stage-1 revoke of dataEditor on one pilot legacy dataset. Exit: revoke applied; 7-day query log shows no consumer breakage.

# Telemetry datamart + Metabase

T-40 — Build meta__bq_usage (dbt model) + at least one dbt test. Exit: model builds daily, populated for the prior day.
T-41 — Build meta__table_lifecycle. Exit: builds daily; rows for every BQ table in scope.
T-42 — Build meta__ingestion_runs (schema only initially; populated as ingestion blueprints emit). Exit: schema deployed, sample row from a smoke test present.
T-43 — Build meta__dbt_runs from manifest + run-results uploaded by T-15. Exit: rows for every Cosmos run.
T-44 — Scaffold environments/dev-04-metabase/ Terraform: Metabase Helm release, Cloud SQL HA backend, IAP fronting, dependabot. Exit: metabase.<domain> reachable via IAP for seed allowlist.
T-45 — First Metabase question + dashboard reading meta__bq_usage: "Cost yesterday". Exit: dashboard loads in <5s.
T-46 — Wire 3 Metabase alerts to Slack: freshness lag, cost spike, ingestion failure. Exit: a forced threshold breach posts to Slack in <10 min.

# Ingestion blueprints

T-50 — Create ume-data-engineering/ingestion/ with three blueprints: jsonl, csv, ftp. README + smoke test each. Exit: three blueprints committed and CI-green.
T-51 — Onboard one real non-CDC source via a blueprint. Exit: source lands in BigQuery and emits to meta__ingestion_runs.

# DataHub scanning (WS-9)

T-60 — DataHub Phase 2 deployed per current roadmap. Exit: UI reachable via IAP for seed allowlist.
T-61 — DataHub ingestion recipes: BigQuery (silver + raw CDC), Airflow DAGs, dbt manifests. Scheduled in Airflow. Exit: all three recipes run nightly.
T-62 — Every shipped silver model appears in DataHub with description, tags, column-level lineage to raw CDC. Exit: searching a raw CDC table shows downstream silver models.

# Legacy consumer migration (WS-8)

T-70 — Catalogue all tables/views sourcing from raw CDC. Tag each with a target silver model or "to sunset". Exit: register published.
T-71 — Repoint at least 5 consumers at silver, verify with row-count tolerance vs raw-CDC version. Exit: 5 consumers green; sunset dates communicated for the rest.

# Lifecycle (WS-10, design only in locomotive)

T-80 — Author docs/infrastructure/lifecycle.md with the rule set (e.g. not queried in 90 days + not referenced in dbt manifest → deprecation candidate). Exit: PR merged.
T-81 — Build a weekly Metabase report listing deprecation candidates from meta__table_lifecycle. Exit: report runs weekly; first review meeting held.

# Open items still to resolve

Confirm Metabase domain and IAP allowlist seed.
Identify 2–3 pilot domains willing to migrate first consumers off legacy CDC.

# Locomotive: silver layer + BQ governance

# Initial brief (organized and corrected)

# Goal

# Source state

# What we are building (corrected)

# What we are doing to BigQuery

# Developer experience

# Collaboration and PR gates

# Ingestion blueprints

# Observability and governance (later, but designed now)

# Domain ownership

# Lifecycle

# New Metabase instance

# Governance via DataHub

# Silver list (locked, 13 tables)

# Locked decisions

# Earliest visible milestones

# Workstreams

# WS-1: Silver GCP project (Terraform)

# WS-2: BigQuery surface lockdown

# WS-3: New dbt project in ume-data-engineering

# WS-4: Dev-X and PR gates

# WS-5: Ingestion blueprints

# WS-6: Telemetry datamart + Metabase observability

# WS-7: New Metabase instance

# WS-8: Legacy consumer migration

# WS-9: DataHub scanning of silver + sources

# WS-10: Lifecycle policy (design now, automate Wave 2)

# Metrification

# Wave 2 (acknowledged, not scoped here)

# Critical files and paths

# Verification (end-to-end)

# Duration estimate

# Tasks (flat list, prioritized)

# Foundation (parallelizable, do these first)

# dbt project bootstrap

# Governance + dev-X

# BQ lockdown (WS-2)

# Telemetry datamart + Metabase

# Ingestion blueprints

# DataHub scanning (WS-9)

# Legacy consumer migration (WS-8)

# Lifecycle (WS-10, design only in locomotive)

# Open items still to resolve

# WS-3: New dbt project in `ume-data-engineering`