#
CDC fan-out architecture — Juice DB
#
Context and problem statement
The production database is a Cloud SQL PostgreSQL 14 instance (700GB disk, 10 CPU, 22GB RAM). Today, Datastream replicates tables directly into BigQuery every 15 minutes. The BigQuery tables are unoptimized (no partitioning, no clustering, no column selection), and many reports source them directly with no bronze/silver/gold layering.
Several use cases require transactional (OLTP) data in PostgreSQL or AlloyDB rather than BigQuery, leading to inefficient patterns where data flows to BigQuery and then back to a relational database. The production database cannot be directly queried by downstream consumers or modified beyond minimal configuration.
#
Architecture overview
The solution introduces Juice DB — a standalone AlloyDB instance that acts as a centralized, read-only fan-out hub. It receives data from production via PostgreSQL native logical replication and distributes it downstream through multiple mechanisms depending on the target type.
Production DB (Cloud SQL PG14)
│
│ PG logical replication (WAL)
│
Juice DB (AlloyDB PG15+)
┌───┼───────────┐
│ │ │
▼ ▼ ▼
[Analytical] [Transactional]
Datastream PG pub/sub with
+ dbt jobs col + row filters
│ │ │
▼ ▼ ▼
BigQuery Cloud SQL
bronze + instances
silver/gold (per-consumer)
#
Key architectural decisions
#
1. Juice DB is a standalone AlloyDB instance, not a read replica
A Cloud SQL read replica would be simpler but cannot act as a logical replication publisher (read replicas are read-only at the WAL level). Juice DB must generate its own logical decoding output to fan out downstream. AlloyDB was chosen over Cloud SQL for Juice DB because its read pool architecture provides workload isolation — the primary node handles replication while read pool nodes absorb dbt queries and Datastream reads without affecting replication performance.
Implication: Juice DB is fed by logical replication from production, not physical streaming replication. This allows Juice DB to run a different PG major version (PG15+) than production (PG14).
#
2. Production → Juice DB: PG native logical replication (cross-version)
PostgreSQL logical replication supports cross-major-version replication. A PG14 publisher can feed a PG15 subscriber natively.
On production (one-time setup):
cloudsql.logical_decodingset toon(likely already enabled for existing Datastream)- A dedicated replication user with
REPLICATIONattribute - A publication:
CREATE PUBLICATION juice_pub FOR TABLE table_a, table_b, ...;
On Juice DB:
- Private IP connectivity to production (same project, same VPC)
CREATE SUBSCRIPTION juice_sub CONNECTION 'host=<prod-private-ip> ...' PUBLICATION juice_pub;
On PG14, the publication is table-level only — no column or row filtering at this leg. That is intentional; Juice DB is the full-fidelity copy. Filtering happens one hop later.
Production footprint: One publication, one replication user, one replication slot. The publication was created independently (not reusing Datastream's publication) to ensure independent lifecycle control.
#
3. Juice DB → downstream PG targets: PG15 logical replication with filtered publications
PG15 introduces native column-list and row-filter support on publications. This enables the core fan-out pattern:
-- Example: Customer A gets a filtered subset
CREATE PUBLICATION customer_a_pub
FOR TABLE orders (id, order_date, amount, status)
WHERE (tenant_code = 1);
-- Customer A's Cloud SQL instance subscribes
CREATE SUBSCRIPTION customer_a_sub
CONNECTION 'host=<juice-db-ip> ...'
PUBLICATION customer_a_pub;
Each downstream consumer gets exactly the tables, columns, and rows they need. This allows smaller instance sizes, tighter permission sets, and tenant-level data isolation.
#
4. Juice DB → BigQuery: Datastream CDC (bronze) + dbt (silver/gold)
Two parallel analytical paths:
- Datastream from Juice DB lands raw CDC data into BigQuery as the bronze layer. Datastream supports table and column selection in its configuration. This replaces the current direct Datastream from production.
- dbt jobs connect to Juice DB's AlloyDB read pool (isolating analytical query load from the replication primary) and materialize transformed models into BigQuery as the silver/gold layers.
#
5. Provisioning: Terraform for infrastructure, idempotent SQL for PG objects
The config repository is the single source of truth, structured as folders per consumer:
replication-configs/
customer-a/
config.yaml # tables, columns, row_filter: "tenant_code = 1"
customer-b/
config.yaml # tables, columns, row_filter: "tenant_code = 2"
analytics-bronze/
config.yaml # Datastream config for BQ bronze
analytics-dbt/
config.yaml # dbt source definitions
Terraform provisions infrastructure: AlloyDB/Cloud SQL instances, Datastream streams, networking, IAM, monitoring dashboards.
Idempotent SQL scripts provision PG-internal objects: publications, subscriptions, schema setup. These are generated from the YAML configs and executed via Airflow or CI/CD.
-- Idempotent publication creation pattern
DO $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_publication WHERE pubname = 'customer_a_pub') THEN
EXECUTE 'CREATE PUBLICATION customer_a_pub
FOR TABLE orders (id, order_date, amount, status)
WHERE (tenant_code = 1)';
END IF;
END $$;
Rationale for the split: Terraform's postgresql provider can manage publications and subscriptions but is brittle for changes — publication modifications often require drop/recreate, which breaks active subscriptions. Idempotent SQL gives more surgical control over migration paths.
#
6. Schema change propagation: polling-based drift detection
PostgreSQL logical replication does not propagate DDL. When production runs ALTER TABLE ADD COLUMN, the new column is silently ignored by subscribers. When production runs ALTER TABLE DROP COLUMN or an incompatible type change, replication can break entirely and WAL accumulates on production disk.
Solution: An Airflow DAG polls information_schema.columns on production and Juice DB every 1-2 minutes. On drift detection:
- Generates
ALTER TABLEstatements for Juice DB - Updates
ALTER PUBLICATIONstatements for affected downstream publications - Generates
ALTER TABLEstatements for each downstream subscriber - Applies in order: Juice DB first, then downstream
- Notifies via Slack
Why polling over event triggers: An event_trigger on production would provide real-time detection but introduces ongoing maintenance overhead on the one system we committed to minimizing changes on. Polling is operationally simpler, and the 1-2 minute window is acceptable because there is always a gap between ALTER TABLE and the first meaningful data insertion into new columns.
#
Monitoring requirements (to be detailed)
Key lag points to observe:
Alerting thresholds and on-call routing TBD.
#
Open questions
- Monitoring and observability stack selection (Cloud Monitoring, Datadog, Grafana)
- On-call routing: dedicated data platform team vs broader SRE
- dbt model weight: incremental vs full-refresh, scheduling strategy
- Existing Datastream from prod → BQ: migration/cutover plan to the new path through Juice DB
- Catalog integration with DataHub: lineage tracking across the fan-out