#
Tooling
The tools listed here are expectations, not commitments. Each choice gets validated in a real vertical before we consider it proven. The goal is to implement what's necessary to deliver value - not to build a complete governance framework upfront.
Some capabilities described in the Architecture and Tools section will wait. We prioritize what makes users perceive immediate improvement: correct data, clear lineage, discoverable assets. Deeper controls come later, once the foundation proves itself.
#
MVP Tooling
These are the tools we expect to deploy for the first two verticals. They're chosen because they integrate with the target architecture, have manageable adoption curves, or address the most pressing pain points.
#
Storage & Compute
The tools remain the same - the transformation is in how we use them. See Object Storage and Lake Engine for detailed practices. The MVPs introduce:
- Naming and organization conventions - Consistent dataset structure, tiered areas (landing, staging, governed)
- Partitioning and clustering standards - Applied to tables where query patterns justify it
- Lifecycle policies - Clear rules for data retention and archival
- Access boundaries - Datasets organized to enable meaningful permission scopes
These practices address the root causes of runaway costs and ungoverned access without changing the underlying platform.
#
Transformations
dbt brings structure to transformations: version-controlled SQL, automated documentation, dependency-aware builds, and data tests. For FinOps, it can replace the opaque Spark pipeline with something maintainable. For Atendimento, it formalizes the views Léo has been building. See ETL for the full transformation strategy.
What we're validating: Whether the team can adopt dbt's workflow. Whether it integrates cleanly with existing BigQuery patterns.
#
Orchestration
Cloud Composer is already available in UME's GCP environment. It provides dependency-aware scheduling and visibility into pipeline runs.
Simple cron-based scheduling is not under consideration - it doesn't provide lineage visibility or dependency management. The orchestrator must understand what runs before what and surface that information to the catalog.
What we're validating: Operational fit - whether Cloud Composer integrates well with dbt and the data catalog for end-to-end lineage.
#
Data Catalog
The catalog is where governance becomes visible - and where culture change takes root. Users discover what data exists, who owns it, whether it's certified, and how it flows from source to consumption.
For the MVPs, we need:
- Asset registration - Tables, views, dashboards documented in one place
- Lineage - "Where does this number come from?" answered visually
- Ownership - Clear accountability for data quality
- Basic quality indicators - Freshness, row counts, test pass/fail
- Single source of truth for metadata - The catalog is authoritative; if it's not registered there, it doesn't officially exist
What we're validating: Whether the chosen catalog integrates well with BigQuery and dbt. Whether users actually consult it. Whether it can serve as the authoritative metadata source that drives adoption of governed practices. See Data Catalog for the full vision.
#
Reporting
The current self-deployed Metabase instance does not meet security and governance criteria. It lacks granular permissions, has no SLA, and creates operational burden. It is not part of the forward plan.
For the MVPs, we evaluate managed alternatives that provide:
- Granular access controls (dataset, row, or column level where needed)
- Integration with identity management
- Reduced operational overhead
The value comes from the governed data underneath. Dashboard migration happens gradually, with priority dashboards moving to governed sources first.
What we're validating: Whether Metabase Cloud (Pro tier) meets security requirements. Whether Hex provides advantages that justify the change in tooling. See Reporting for the full reporting strategy.
#
Additional Capabilities Under Consideration
Beyond the core MVP tooling, several capabilities may be tested depending on scope and priorities:
These aren't deferred indefinitely - they're evaluated as the MVPs progress and real needs surface.
#
Tool Selection Criteria
When evaluating tools (now or later), we weigh:
Fit with target architecture - Tools must integrate with the architecture vision, especially with the data catalog as the central hub for governance and culture adoption. Isolated tools that don't contribute metadata or lineage are less valuable.
Integration with existing stack - GCP-native or proven GCP compatibility reduces friction.
Change management compatibility - Can assets be defined as code (YAML, SQL, config files)? Can changes be tracked in version control? Does the tool support a software development lifecycle - review, test, deploy, rollback? Tools that only operate through UIs limit auditability and repeatability.
Team adoption curve - Tools the team can operate without deep specialization.
Visibility of value - Preference for tools where users see the benefit directly (catalog, lineage) over backend-only improvements.
Managed over self-hosted - Where cost allows, reduce operational burden.
Exit path - Avoid lock-in that would make future changes painful.
#
Expected Stack Summary
The diagram below represents the MVP scope - what we expect to validate in the first two verticals. For the full architecture vision, see Architecture and Tools.
┌─────────────────────────────────────────────────────┐
│ Consumption │
│ Metabase Cloud / Looker Studio / Hex │
└─────────────────────────────────────────────────────┘
▲
┌─────────────────────────────────────────────────────┐
│ Data Catalog │
│ DataHub / OpenMetadata │
│ (discovery, lineage, ownership, metadata authority)│
└─────────────────────────────────────────────────────┘
▲
┌─────────────────────────────────────────────────────┐
│ Transformations │
│ dbt + BigQuery │
│ (governed views, tested, documented) │
└─────────────────────────────────────────────────────┘
▲
┌─────────────────────────────────────────────────────┐
│ Orchestration │
│ Cloud Composer │
│ (dependency-aware, lineage-linked) │
└─────────────────────────────────────────────────────┘
▲
┌─────────────────────────────────────────────────────┐
│ Storage │
│ GCS (raw) + BigQuery (DW) │
│ (with conventions, policies, boundaries) │
└─────────────────────────────────────────────────────┘
▲
┌─────────────────────────────────────────────────────┐
│ Data Sources │
│ Total IP, Infobip, GA, IUGO, Milênio, etc. │
└─────────────────────────────────────────────────────┘
This isn't the final architecture - it's what we expect to validate first. Adjustments will come from implementation experience, not theoretical planning.