#
Architecture Overview
Warning
The contents in the children pages was filled with thorough input from meeting and internal notes, but mostly is AI Generated. Though it definitelly brings insights, it might not be 100% accurate to what is going to be implemented.
The UME data platform is designed to provide a unified, governed, and cost-effective environment for data ingestion, transformation, storage, and consumption. The architecture prioritizes data trust, governance, and operational efficiency.
#
Design Principles
- Governance First: Every layer incorporates access controls, lineage tracking, and audit capabilities.
- Trust in Data: Clear separation between raw, validated, and curated data through the medallion architecture.
- Cost Optimization: Thoughtful capacity planning with on-demand and provisioned options.
- Self-Service with Guardrails: Enable teams to explore and use data while preventing ungoverned sprawl.
#
Architecture Layers
The platform is organized into distinct layers, each with specific responsibilities:
#
Data Sources and Transactional Systems
The foundation layer captures data from various source systems including transactional databases, point-of-sale systems, partner integrations, mobile applications, and operational logs. Data flows into the platform through CDC (Change Data Capture) and specialized loaders.
#
Object Storage (GCS)
Google Cloud Storage serves as the persistent storage layer, organized using the medallion architecture:
- Bronze: Raw ingested data preserved as-is
- Silver: Cleaned, validated, and conformed data
- Gold: Business-ready, aggregated, and curated datasets
Storage security includes managed KMS keys, fine-grained access policies, and comprehensive audit trails.
Learn more about Object Storage
#
Lake Engine
The query and federation layer that enables unified access to data across different storage tiers and sources. It enforces access policies at query time and provides both on-demand and provisioned capacity options for cost optimization.
Learn more about the Lake Engine
#
ETL
The transformation layer handles data movement and processing with:
- Orchestration and scheduling
- Reusable blueprints for common patterns
- Data testing and validation
- Alerting and operational playbooks
- Multi-tenant controls
#
Data Catalog
The central governance hub providing:
- Data discovery and documentation
- Lineage tracking across all tiers
- Data stewardship and ownership
- Compliance monitoring and PII identification
Learn more about the Data Catalog
#
Reporting and Analytics
Business intelligence capabilities including ad-hoc exploration, governed dashboards, and KPI monitoring with proper access controls and multi-tenancy support.
#
Data Science
Machine learning platform capabilities including notebook environments, experiment tracking, model lifecycle management, and integration with version control.
#
Cross-Cutting Concerns
Several capabilities span across all layers:
- Identity Federation: Unified authentication and authorization across the platform
- Multi-Tenancy: Isolation and access controls for different business units and external tenants
- Data Lineage: End-to-end tracking from source systems to consumption
Learn more about Cross-Cutting Concerns