# Tooling

The tools listed here are expectations, not commitments. Each choice gets validated in a real vertical before we consider it proven. The goal is to implement what's necessary to deliver value - not to build a complete governance framework upfront.

Some capabilities described in the Architecture and Tools section will wait. We prioritize what makes users perceive immediate improvement: correct data, clear lineage, discoverable assets. Deeper controls come later, once the foundation proves itself.


# MVP Tooling

These are the tools we expect to deploy for the first two verticals. They're chosen because they integrate with the target architecture, have manageable adoption curves, or address the most pressing pain points.

# Storage & Compute

Tool Status MVP Role
Google Cloud Storage Already in use Landing area for raw data, file-based integrations
BigQuery Already in use Primary data warehouse, query engine, source for dashboards

The tools remain the same - the transformation is in how we use them. See Object Storage and Lake Engine for detailed practices. The MVPs introduce:

  • Naming and organization conventions - Consistent dataset structure, tiered areas (landing, staging, governed)
  • Partitioning and clustering standards - Applied to tables where query patterns justify it
  • Lifecycle policies - Clear rules for data retention and archival
  • Access boundaries - Datasets organized to enable meaningful permission scopes

These practices address the root causes of runaway costs and ungoverned access without changing the underlying platform.

# Transformations

Tool Expected MVP Role
dbt (Core or Cloud) To validate SQL-based transformations with built-in lineage and testing. Replaces ad-hoc scripts and complex Spark jobs for the reconciliation pipeline.

dbt brings structure to transformations: version-controlled SQL, automated documentation, dependency-aware builds, and data tests. For FinOps, it can replace the opaque Spark pipeline with something maintainable. For Atendimento, it formalizes the views Léo has been building. See ETL for the full transformation strategy.

What we're validating: Whether the team can adopt dbt's workflow. Whether it integrates cleanly with existing BigQuery patterns.

# Orchestration

Tool Expected MVP Role
Cloud Composer (Airflow) To validate Scheduling dbt runs, data ingestion jobs, reconciliation workflows

Cloud Composer is already available in UME's GCP environment. It provides dependency-aware scheduling and visibility into pipeline runs.

Simple cron-based scheduling is not under consideration - it doesn't provide lineage visibility or dependency management. The orchestrator must understand what runs before what and surface that information to the catalog.

What we're validating: Operational fit - whether Cloud Composer integrates well with dbt and the data catalog for end-to-end lineage.

# Data Catalog

Tool Expected MVP Role
DataHub or OpenMetadata To validate Central registry of data assets, lineage visualization, ownership tracking

The catalog is where governance becomes visible - and where culture change takes root. Users discover what data exists, who owns it, whether it's certified, and how it flows from source to consumption.

For the MVPs, we need:

  • Asset registration - Tables, views, dashboards documented in one place
  • Lineage - "Where does this number come from?" answered visually
  • Ownership - Clear accountability for data quality
  • Basic quality indicators - Freshness, row counts, test pass/fail
  • Single source of truth for metadata - The catalog is authoritative; if it's not registered there, it doesn't officially exist

What we're validating: Whether the chosen catalog integrates well with BigQuery and dbt. Whether users actually consult it. Whether it can serve as the authoritative metadata source that drives adoption of governed practices. See Data Catalog for the full vision.

# Reporting

Tool Status MVP Role
Metabase Cloud To evaluate Managed Metabase with proper security controls, replaces self-deployed instance
Looker Studio Already in use Continues for management dashboards
Hex To evaluate Alternative if notebook-style analytics and dashboards can converge

The current self-deployed Metabase instance does not meet security and governance criteria. It lacks granular permissions, has no SLA, and creates operational burden. It is not part of the forward plan.

For the MVPs, we evaluate managed alternatives that provide:

  • Granular access controls (dataset, row, or column level where needed)
  • Integration with identity management
  • Reduced operational overhead

The value comes from the governed data underneath. Dashboard migration happens gradually, with priority dashboards moving to governed sources first.

What we're validating: Whether Metabase Cloud (Pro tier) meets security requirements. Whether Hex provides advantages that justify the change in tooling. See Reporting for the full reporting strategy.


# Additional Capabilities Under Consideration

Beyond the core MVP tooling, several capabilities may be tested depending on scope and priorities:

Capability Status Notes
Query acceleration (BI Engine) Will test At least one scenario to validate cost/performance tradeoff for high-frequency dashboards
Transactional database (AlloyDB) May test Not decided; depends on whether a clear use case emerges during MVPs
PII scanning Under consideration Important for compliance maturity; timing depends on MVP priorities
Fine-grained BigQuery policies Possible Row/column level security may or may not be required; depends on access patterns discovered during implementation
Schema enforcement Will test At least one case to validate the pattern for data contracts
Alerting and anomaly detection Likely Particularly for cost monitoring - we want professional visibility into resource consumption and spend

These aren't deferred indefinitely - they're evaluated as the MVPs progress and real needs surface.


# Tool Selection Criteria

When evaluating tools (now or later), we weigh:

  1. Fit with target architecture - Tools must integrate with the architecture vision, especially with the data catalog as the central hub for governance and culture adoption. Isolated tools that don't contribute metadata or lineage are less valuable.

  2. Integration with existing stack - GCP-native or proven GCP compatibility reduces friction.

  3. Change management compatibility - Can assets be defined as code (YAML, SQL, config files)? Can changes be tracked in version control? Does the tool support a software development lifecycle - review, test, deploy, rollback? Tools that only operate through UIs limit auditability and repeatability.

  4. Team adoption curve - Tools the team can operate without deep specialization.

  5. Visibility of value - Preference for tools where users see the benefit directly (catalog, lineage) over backend-only improvements.

  6. Managed over self-hosted - Where cost allows, reduce operational burden.

  7. Exit path - Avoid lock-in that would make future changes painful.


# Expected Stack Summary

The diagram below represents the MVP scope - what we expect to validate in the first two verticals. For the full architecture vision, see Architecture and Tools.

┌─────────────────────────────────────────────────────┐
│                    Consumption                       │
│        Metabase Cloud / Looker Studio / Hex         │
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                   Data Catalog                       │
│              DataHub / OpenMetadata                  │
│  (discovery, lineage, ownership, metadata authority)│
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                 Transformations                      │
│                    dbt + BigQuery                    │
│          (governed views, tested, documented)       │
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                  Orchestration                       │
│                  Cloud Composer                      │
│            (dependency-aware, lineage-linked)       │
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                    Storage                           │
│              GCS (raw) + BigQuery (DW)              │
│      (with conventions, policies, boundaries)       │
└─────────────────────────────────────────────────────┘
                          ▲
┌─────────────────────────────────────────────────────┐
│                  Data Sources                        │
│     Total IP, Infobip, GA, IUGO, Milênio, etc.     │
└─────────────────────────────────────────────────────┘

This isn't the final architecture - it's what we expect to validate first. Adjustments will come from implementation experience, not theoretical planning.