# Tooling

The tools listed here are expectations, not commitments. Each choice gets validated in a real vertical before we consider it proven. The goal is to implement what's necessary to deliver value - not to build a complete governance framework upfront.

Some capabilities described in the Architecture and Tools section will wait. We prioritize what makes users perceive immediate improvement: correct data, clear lineage, discoverable assets. Deeper controls come later, once the foundation proves itself.

# MVP Tooling

These are the tools we expect to deploy for the first two verticals. They're chosen because they integrate with the target architecture, have manageable adoption curves, or address the most pressing pain points.

# Storage & Compute

Tool	Status	MVP Role
Google Cloud Storage	Already in use	Landing area for raw data, file-based integrations
BigQuery	Already in use	Primary data warehouse, query engine, source for dashboards

The tools remain the same - the transformation is in how we use them. See Object Storage and Lake Engine for detailed practices. The MVPs introduce:

Naming and organization conventions - Consistent dataset structure, tiered areas (landing, staging, governed)
Partitioning and clustering standards - Applied to tables where query patterns justify it
Lifecycle policies - Clear rules for data retention and archival
Access boundaries - Datasets organized to enable meaningful permission scopes

These practices address the root causes of runaway costs and ungoverned access without changing the underlying platform.

# Transformations

Tool	Expected	MVP Role
dbt (Core or Cloud)	To validate	SQL-based transformations with built-in lineage and testing. Replaces ad-hoc scripts and complex Spark jobs for the reconciliation pipeline.

dbt brings structure to transformations: version-controlled SQL, automated documentation, dependency-aware builds, and data tests. For FinOps, it can replace the opaque Spark pipeline with something maintainable. For Atendimento, it formalizes the views Léo has been building. See ETL for the full transformation strategy.

What we're validating: Whether the team can adopt dbt's workflow. Whether it integrates cleanly with existing BigQuery patterns.

# Orchestration

Tool	Expected	MVP Role
Cloud Composer (Airflow)	To validate	Scheduling dbt runs, data ingestion jobs, reconciliation workflows

Cloud Composer is already available in UME's GCP environment. It provides dependency-aware scheduling and visibility into pipeline runs.

Simple cron-based scheduling is not under consideration - it doesn't provide lineage visibility or dependency management. The orchestrator must understand what runs before what and surface that information to the catalog.

What we're validating: Operational fit - whether Cloud Composer integrates well with dbt and the data catalog for end-to-end lineage.

# Data Catalog

Tool	Expected	MVP Role
DataHub or OpenMetadata	To validate	Central registry of data assets, lineage visualization, ownership tracking

The catalog is where governance becomes visible - and where culture change takes root. Users discover what data exists, who owns it, whether it's certified, and how it flows from source to consumption.

For the MVPs, we need:

Asset registration - Tables, views, dashboards documented in one place
Lineage - "Where does this number come from?" answered visually
Ownership - Clear accountability for data quality
Basic quality indicators - Freshness, row counts, test pass/fail
Single source of truth for metadata - The catalog is authoritative; if it's not registered there, it doesn't officially exist

What we're validating: Whether the chosen catalog integrates well with BigQuery and dbt. Whether users actually consult it. Whether it can serve as the authoritative metadata source that drives adoption of governed practices. See Data Catalog for the full vision.

# Reporting

Tool	Status	MVP Role
Metabase Cloud	To evaluate	Managed Metabase with proper security controls, replaces self-deployed instance
Looker Studio	Already in use	Continues for management dashboards
Hex	To evaluate	Alternative if notebook-style analytics and dashboards can converge

The current self-deployed Metabase instance does not meet security and governance criteria. It lacks granular permissions, has no SLA, and creates operational burden. It is not part of the forward plan.

For the MVPs, we evaluate managed alternatives that provide:

Granular access controls (dataset, row, or column level where needed)
Integration with identity management
Reduced operational overhead

The value comes from the governed data underneath. Dashboard migration happens gradually, with priority dashboards moving to governed sources first.

What we're validating: Whether Metabase Cloud (Pro tier) meets security requirements. Whether Hex provides advantages that justify the change in tooling. See Reporting for the full reporting strategy.

# Additional Capabilities Under Consideration

Beyond the core MVP tooling, several capabilities may be tested depending on scope and priorities:

Capability	Status	Notes
Query acceleration (BI Engine)	Will test	At least one scenario to validate cost/performance tradeoff for high-frequency dashboards
Transactional database (AlloyDB)	May test	Not decided; depends on whether a clear use case emerges during MVPs
PII scanning	Under consideration	Important for compliance maturity; timing depends on MVP priorities
Fine-grained BigQuery policies	Possible	Row/column level security may or may not be required; depends on access patterns discovered during implementation
Schema enforcement	Will test	At least one case to validate the pattern for data contracts
Alerting and anomaly detection	Likely	Particularly for cost monitoring - we want professional visibility into resource consumption and spend

These aren't deferred indefinitely - they're evaluated as the MVPs progress and real needs surface.

# Tool Selection Criteria

When evaluating tools (now or later), we weigh:

Fit with target architecture - Tools must integrate with the architecture vision, especially with the data catalog as the central hub for governance and culture adoption. Isolated tools that don't contribute metadata or lineage are less valuable.
Integration with existing stack - GCP-native or proven GCP compatibility reduces friction.
Change management compatibility - Can assets be defined as code (YAML, SQL, config files)? Can changes be tracked in version control? Does the tool support a software development lifecycle - review, test, deploy, rollback? Tools that only operate through UIs limit auditability and repeatability.
Team adoption curve - Tools the team can operate without deep specialization.
Visibility of value - Preference for tools where users see the benefit directly (catalog, lineage) over backend-only improvements.
Managed over self-hosted - Where cost allows, reduce operational burden.
Exit path - Avoid lock-in that would make future changes painful.

# Expected Stack Summary

The diagram below represents the MVP scope - what we expect to validate in the first two verticals. For the full architecture vision, see Architecture and Tools.

┌─────────────────────────────────────────────────────┐
│                    Consumption                       │
│        Metabase Cloud / Looker Studio / Hex         │
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                   Data Catalog                       │
│              DataHub / OpenMetadata                  │
│  (discovery, lineage, ownership, metadata authority)│
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                 Transformations                      │
│                    dbt + BigQuery                    │
│          (governed views, tested, documented)       │
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                  Orchestration                       │
│                  Cloud Composer                      │
│            (dependency-aware, lineage-linked)       │
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                    Storage                           │
│              GCS (raw) + BigQuery (DW)              │
│      (with conventions, policies, boundaries)       │
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                  Data Sources                        │
│     Total IP, Infobip, GA, IUGO, Milênio, etc.     │
└─────────────────────────────────────────────────────┘

This isn't the final architecture - it's what we expect to validate first. Adjustments will come from implementation experience, not theoretical planning.