# Architecture Overview

The UME data platform is designed to provide a unified, governed, and cost-effective environment for data ingestion, transformation, storage, and consumption. The architecture prioritizes data trust, governance, and operational efficiency.

UME Data Platform Architecture
UME Data Platform Architecture

# Design Principles

  1. Governance First: Every layer incorporates access controls, lineage tracking, and audit capabilities.
  2. Trust in Data: Clear separation between raw, validated, and curated data through the medallion architecture.
  3. Cost Optimization: Thoughtful capacity planning with on-demand and provisioned options.
  4. Self-Service with Guardrails: Enable teams to explore and use data while preventing ungoverned sprawl.

# Architecture Layers

The platform is organized into distinct layers, each with specific responsibilities:

# Data Sources and Transactional Systems

The foundation layer captures data from various source systems including transactional databases, point-of-sale systems, partner integrations, mobile applications, and operational logs. Data flows into the platform through CDC (Change Data Capture) and specialized loaders.

Learn more about Data Sources

# Object Storage (GCS)

Google Cloud Storage serves as the persistent storage layer, organized using the medallion architecture:

  • Bronze: Raw ingested data preserved as-is
  • Silver: Cleaned, validated, and conformed data
  • Gold: Business-ready, aggregated, and curated datasets

Storage security includes managed KMS keys, fine-grained access policies, and comprehensive audit trails.

Learn more about Object Storage

# Lake Engine

The query and federation layer that enables unified access to data across different storage tiers and sources. It enforces access policies at query time and provides both on-demand and provisioned capacity options for cost optimization.

Learn more about the Lake Engine

# ETL

The transformation layer handles data movement and processing with:

  • Orchestration and scheduling
  • Reusable blueprints for common patterns
  • Data testing and validation
  • Alerting and operational playbooks
  • Multi-tenant controls

Learn more about ETL

# Data Catalog

The central governance hub providing:

  • Data discovery and documentation
  • Lineage tracking across all tiers
  • Data stewardship and ownership
  • Compliance monitoring and PII identification

Learn more about the Data Catalog

# Reporting and Analytics

Business intelligence capabilities including ad-hoc exploration, governed dashboards, and KPI monitoring with proper access controls and multi-tenancy support.

Learn more about Reporting

# Data Science

Machine learning platform capabilities including notebook environments, experiment tracking, model lifecycle management, and integration with version control.

Learn more about Data Science

# Cross-Cutting Concerns

Several capabilities span across all layers:

  • Identity Federation: Unified authentication and authorization across the platform
  • Multi-Tenancy: Isolation and access controls for different business units and external tenants
  • Data Lineage: End-to-end tracking from source systems to consumption

Learn more about Cross-Cutting Concerns

# Navigation

Section Description
Data Sources Source systems and onboarding blueprints
Object Storage GCS organization and medallion architecture
Lake Engine Query federation and capacity models
ETL Pipelines, orchestration, and testing
Data Catalog Governance, discovery, and compliance
Reporting Business intelligence and dashboards
Data Science ML platform and model lifecycle
Cross-Cutting Identity, multi-tenancy, and lineage