# Cross-Cutting Concerns

Several capabilities span across all layers of the data platform. These cross-cutting concerns ensure consistent governance, security, and visibility throughout the system.

# Identity Federation

Unified identity management across all platform components.

# Single Sign-On (SSO)

All platform tools authenticate through a central identity provider:

Benefits:

One login for all data tools
Consistent user experience
Centralized access management
Simplified offboarding

Coverage:

Storage access (GCS, BigQuery)
Reporting tools
Data catalog
ETL platforms
Notebook environments

# Unified Identity

Users maintain a single identity across the platform:

Capabilities:

Consistent user identifier
Group membership propagation
Role-based access
Attribute-based policies

Enforcement:

No shared service accounts for user access
Individual identity for audit trails
Federated identity for external users

# Audit Trails

Track user actions across the platform:

Logged events:

Authentication events (login, logout)
Authorization decisions (access granted/denied)
Data access (queries, downloads)
Configuration changes

Per-user attribution:

All actions tied to individual identity
No anonymous or shared access
Complete audit history

Retention:

Logs retained per company policy
Available for security investigations
Compliance reporting

# Multi-Tenancy

Support for multiple business units and external partners within a single platform.

# Business Context

As UME evolves from BNPL operator to multi-issuer technology provider, the platform must support:

Internal business units with different data needs
External partners (retailers, issuers) with isolated access
White-label deployments with tenant-specific data

# Tenant Isolation at Storage Level

Data separation in object storage and databases:

Approaches:

Approach	Description	Use Case
Separate projects	Complete GCP project isolation	Strict isolation requirements
Separate buckets	Same project, different buckets	Moderate isolation
Partitioned data	Same bucket, partitioned by tenant	Cost-efficient, RLS-enforced

Implementation considerations:

Cost vs. isolation trade-offs
Cross-tenant analytics requirements
Compliance and regulatory needs

# Row-Level Security in Queries

Filter data at query time based on user tenant:

How it works:

User identity includes tenant attribute
Query engine applies automatic filter
User sees only their tenant's data
Works across all query tools

Benefits:

Single query layer for all tenants
Reduced data duplication
Consistent access enforcement
Simplified administration

# Tenant-Aware ETL

Pipelines that handle multi-tenant data:

Patterns:

Pattern	Description
Shared pipeline	One pipeline processes all tenants with partitioning
Tenant pipelines	Separate pipeline instance per tenant
Hybrid	Shared ingestion, tenant-specific transformations

Considerations:

Compute isolation requirements
Cost attribution
Scheduling and priority
Failure isolation

# Cost Attribution

Track and allocate costs by tenant:

Storage costs per tenant
Query costs per tenant
Compute costs for tenant-specific jobs
Reporting and chargeback

# Data Lineage

End-to-end tracking of data flow through the platform.

# Scope of Lineage

What lineage captures:

Source Systems → Ingestion → Bronze → Silver → Gold → Reports/Models

Tracked elements:

Data sources and systems
Transformations and logic
Storage locations
Downstream consumers

# End-to-End Tracking

Complete visibility from source to consumption:

Source tracking:

Which source system produced the data
When was it extracted
What extraction method was used

Transformation tracking:

What logic was applied
Which pipeline processed it
When was it transformed

Consumption tracking:

Which reports use this data
Which models depend on it
Who is querying it

# Impact Analysis

Understand the effects of changes:

Forward impact: "If I change this table, what breaks?"

Downstream pipelines
Dependent reports
Consuming models

Backward impact: "Where does this data come from?"

Source systems
Upstream transformations
Data quality issues

# Dependency Visualization

Graph-based view of data relationships:

Capabilities:

Interactive exploration
Filter by domain or owner
Highlight critical paths
Export for documentation

Use cases:

Change planning
Incident investigation
Onboarding new team members
Audit and compliance

# Lineage Metadata

What is captured:

Metadata	Description
Source	Origin table/system
Transformation	SQL, code, or tool used
Target	Destination table
Timestamp	When transformation ran
Job ID	Pipeline run identifier
Owner	Responsible team/person

# Governance Integration

How cross-cutting concerns support governance:

# Access Control Chain

Consistent enforcement across layers:

Identity: Who is making the request
Authentication: Verify identity via SSO
Authorization: Check permissions in policy
Audit: Log the access decision

# Compliance Support

Cross-cutting capabilities enable compliance:

PII tracking: Lineage shows where sensitive data flows
Access audit: Identity federation provides complete logs
Tenant isolation: Multi-tenancy ensures data separation
Data retention: Lineage helps identify data for archival

# Operational Visibility

Understand platform behavior:

Who is using what data
How data flows through the system
Where bottlenecks occur
What the cost drivers are

# Implementation Status

Capability	Status	Notes
SSO	Partial	Some tools integrated
User audit trails	Partial	BigQuery, needs expansion
Tenant isolation	Planned	Architecture defined
Row-level security	Planned	Policy design in progress
Data lineage	Early	Dataplex integration started

Lake Engine - Row-level security implementation
Data Catalog - Lineage visualization
Object Storage - Tenant storage organization
Reporting - Multi-tenant dashboards