# Cross-Cutting Concerns

Several capabilities span across all layers of the data platform. These cross-cutting concerns ensure consistent governance, security, and visibility throughout the system.

# Identity Federation

Unified identity management across all platform components.

# Single Sign-On (SSO)

All platform tools authenticate through a central identity provider:

Benefits:

  • One login for all data tools
  • Consistent user experience
  • Centralized access management
  • Simplified offboarding

Coverage:

  • Storage access (GCS, BigQuery)
  • Reporting tools
  • Data catalog
  • ETL platforms
  • Notebook environments

# Unified Identity

Users maintain a single identity across the platform:

Capabilities:

  • Consistent user identifier
  • Group membership propagation
  • Role-based access
  • Attribute-based policies

Enforcement:

  • No shared service accounts for user access
  • Individual identity for audit trails
  • Federated identity for external users

# Audit Trails

Track user actions across the platform:

Logged events:

  • Authentication events (login, logout)
  • Authorization decisions (access granted/denied)
  • Data access (queries, downloads)
  • Configuration changes

Per-user attribution:

  • All actions tied to individual identity
  • No anonymous or shared access
  • Complete audit history

Retention:

  • Logs retained per company policy
  • Available for security investigations
  • Compliance reporting

# Multi-Tenancy

Support for multiple business units and external partners within a single platform.

# Business Context

As UME evolves from BNPL operator to multi-issuer technology provider, the platform must support:

  • Internal business units with different data needs
  • External partners (retailers, issuers) with isolated access
  • White-label deployments with tenant-specific data

# Tenant Isolation at Storage Level

Data separation in object storage and databases:

Approaches:

Approach Description Use Case
Separate projects Complete GCP project isolation Strict isolation requirements
Separate buckets Same project, different buckets Moderate isolation
Partitioned data Same bucket, partitioned by tenant Cost-efficient, RLS-enforced

Implementation considerations:

  • Cost vs. isolation trade-offs
  • Cross-tenant analytics requirements
  • Compliance and regulatory needs

# Row-Level Security in Queries

Filter data at query time based on user tenant:

How it works:

  1. User identity includes tenant attribute
  2. Query engine applies automatic filter
  3. User sees only their tenant's data
  4. Works across all query tools

Benefits:

  • Single query layer for all tenants
  • Reduced data duplication
  • Consistent access enforcement
  • Simplified administration

# Tenant-Aware ETL

Pipelines that handle multi-tenant data:

Patterns:

Pattern Description
Shared pipeline One pipeline processes all tenants with partitioning
Tenant pipelines Separate pipeline instance per tenant
Hybrid Shared ingestion, tenant-specific transformations

Considerations:

  • Compute isolation requirements
  • Cost attribution
  • Scheduling and priority
  • Failure isolation

# Cost Attribution

Track and allocate costs by tenant:

  • Storage costs per tenant
  • Query costs per tenant
  • Compute costs for tenant-specific jobs
  • Reporting and chargeback

# Data Lineage

End-to-end tracking of data flow through the platform.

# Scope of Lineage

What lineage captures:

Source Systems → Ingestion → Bronze → Silver → Gold → Reports/Models

Tracked elements:

  • Data sources and systems
  • Transformations and logic
  • Storage locations
  • Downstream consumers

# End-to-End Tracking

Complete visibility from source to consumption:

Source tracking:

  • Which source system produced the data
  • When was it extracted
  • What extraction method was used

Transformation tracking:

  • What logic was applied
  • Which pipeline processed it
  • When was it transformed

Consumption tracking:

  • Which reports use this data
  • Which models depend on it
  • Who is querying it

# Impact Analysis

Understand the effects of changes:

Forward impact: "If I change this table, what breaks?"

  • Downstream pipelines
  • Dependent reports
  • Consuming models

Backward impact: "Where does this data come from?"

  • Source systems
  • Upstream transformations
  • Data quality issues

# Dependency Visualization

Graph-based view of data relationships:

Capabilities:

  • Interactive exploration
  • Filter by domain or owner
  • Highlight critical paths
  • Export for documentation

Use cases:

  • Change planning
  • Incident investigation
  • Onboarding new team members
  • Audit and compliance

# Lineage Metadata

What is captured:

Metadata Description
Source Origin table/system
Transformation SQL, code, or tool used
Target Destination table
Timestamp When transformation ran
Job ID Pipeline run identifier
Owner Responsible team/person

# Governance Integration

How cross-cutting concerns support governance:

# Access Control Chain

Consistent enforcement across layers:

  1. Identity: Who is making the request
  2. Authentication: Verify identity via SSO
  3. Authorization: Check permissions in policy
  4. Audit: Log the access decision

# Compliance Support

Cross-cutting capabilities enable compliance:

  • PII tracking: Lineage shows where sensitive data flows
  • Access audit: Identity federation provides complete logs
  • Tenant isolation: Multi-tenancy ensures data separation
  • Data retention: Lineage helps identify data for archival

# Operational Visibility

Understand platform behavior:

  • Who is using what data
  • How data flows through the system
  • Where bottlenecks occur
  • What the cost drivers are

# Implementation Status

Capability Status Notes
SSO Partial Some tools integrated
User audit trails Partial BigQuery, needs expansion
Tenant isolation Planned Architecture defined
Row-level security Planned Policy design in progress
Data lineage Early Dataplex integration started

# Related Sections