# Reporting and Analytics

Recommendation:

Run the first two MVPs on the new architecture using Looker Studio as the primary reporting tool.
In parallel, evaluate Metabase Pro for security, sharing, row-level security, and integration with the data catalog. The existing user base and report inventory represent significant value if the governance gaps can be closed.
Conduct a small PoC with Hex to assess the same dimensions - security, sharing, catalog integration - and compare the experience against Metabase Pro.
Defer final tooling decisions until evaluations are complete and governance infrastructure (catalog, lineage, cost attribution) is in place.

Reporting and analytics represent the primary consumption layer of the data platform. This is where business users, analysts, and leadership interact with data - through dashboards, ad-hoc queries, and exploratory analysis. Because this layer is the most visible and most frequently accessed, decisions made here have outsized impact on both cost and trust.

The goal is not to restrict access, but to channel it. Users should feel empowered to explore data while the platform ensures that exploration happens on top of governed, validated, and cost-optimized datasets.

This section covers both operational reporting (dashboards, KPIs) and exploratory analytics (ad-hoc analysis, self-service). Data Science workloads are addressed separately in Data Science.

# Current Challenges

Before discussing tooling and practices, it helps to acknowledge the problems we are solving:

Shift-right complexity - Reports that contain all the business logic needed to calculate a KPI. Instead of consuming pre-aggregated, validated data, dashboards perform heavy joins, filters, and calculations at query time. This increases cost and introduces inconsistency.
Duplicate indicators - The same KPI (e.g., FPD) exists in dozens or hundreds of variations across dashboards. Different users created their own versions, often slightly different, leading to conflicting numbers and eroded trust.
Ungoverned access - Users with broad access to raw data create ad-hoc artifacts outside the governed pipeline. These artifacts become dependencies for other users, but are not maintained or documented.
Cost unpredictability - A single poorly written dashboard can consume disproportionate resources. Without visibility into which artifacts drive cost, optimization is reactive rather than proactive.

The recommendations in this section aim to address these problems systematically.

# Shift-Left Complexity

The most impactful principle for reporting is to push complexity into the ETL layer. Dashboards should be thin consumers of pre-computed, validated datasets - not engines that rebuild business logic on every refresh.

What this means in practice:

KPIs and metrics are calculated in dbt models, not in dashboard queries
Aggregations happen during transformation, not at visualization time
Business rules (e.g., "FPD is defined as...") live in versioned SQL, not scattered across report definitions
Dashboards reference Gold-layer tables that are already partitioned, clustered, and optimized for the query patterns they serve

This approach delivers consistency (one definition, many consumers), cost efficiency (computation happens once during ETL, not repeatedly at query time), and maintainability (changes to business logic are made in one place).

See ETL - Design Principles for implementation patterns that support this model.

# Tooling Strategy

# Primary: Looker Studio

For the first MVPs and near-term reporting needs, we favor Looker Studio (formerly Google Data Studio).

Why Looker Studio:

Native integration with BigQuery and GCP identity
No additional licensing cost for basic use
Familiar to existing users, particularly leadership and management
Supports sharing and embedding with Google Workspace integration

Limitations to acknowledge:

Less intuitive for complex dashboard construction compared to alternatives
Limited row-level security capabilities in the free tier
Query optimization relies entirely on underlying data modeling - Looker Studio does not cache aggressively

Looker Studio is not the final answer, but it provides a stable baseline while we evaluate alternatives and establish governance foundations.

# Evaluation: Metabase Pro

We have a large base of existing reports in Metabase and significant user familiarity. Rather than force a disruptive migration, we should evaluate whether Metabase Pro (or Enterprise) addresses the security and governance gaps that currently prevent its continued use.

What to evaluate:

Granular permissions - Can we restrict access at database, schema, table, and column level?
Row-level security - Can we implement tenant or role-based filtering natively?
Identity federation - Can we integrate with Google Workspace SSO and maintain per-user audit trails?
Sharing controls - Can we safely share dashboards with external parties (retailers, assessors) without exposing sensitive data?

Additional consideration:

Metabase offers Metabase Documents, a lightweight notebook-like feature that allows analysts to combine queries, visualizations, and narrative text. This could serve as an intermediate layer between raw exploration and formal dashboards - a place to tell a story with data, add context, and share insights with comments. Worth evaluating for its potential to improve knowledge sharing without the overhead of full notebook environments.

Recommendation: Conduct a focused evaluation of Metabase Pro subscription. If it resolves security concerns and supports the governance model we are building, the continuity benefit (existing reports, user knowledge, satisfaction) may outweigh the cost of migration to another tool.

# Evaluation: Hex

Hex is a modern analytics platform that combines notebooks, dashboards, and collaboration in a single environment. It has gained traction in data teams that want to bridge the gap between exploration and production reporting.

What to evaluate in a small PoC:

Security model and identity integration with GCP
Sharing and publishing capabilities
Cost structure (per-editor pricing)
AI-assisted features for query building and exploration
Integration with BigQuery and ability to enforce access policies

Hex is particularly interesting if we want to provide a unified environment where analysts can explore data, build visualizations, and publish findings - without requiring separate tools for each step.

Recommendation: Conduct a small proof-of-concept with Hex to assess the same dimensions evaluated for Metabase Pro: security, sharing, governance integration. Compare the experience and cost against Metabase Pro before making a final tooling decision.

# Cost Management for Reporting

Reporting workloads are often the largest cost driver in BigQuery because they involve frequent, repeated queries across potentially large datasets. Managing this requires visibility and prevention.

# Cost Attribution

Every dashboard and report should be traceable to:

The tables it queries
The cost it generates (bytes scanned, slots consumed)
The users who access it

This traceability is not automatic - it requires intentional design:

Query tagging - Configure reporting tools to include job labels (e.g., dashboard ID, user identity) in BigQuery queries. BigQuery's INFORMATION_SCHEMA.JOBS view exposes these labels, enabling cost attribution.
Lineage capture - Lineage to downstream dashboards should flow from two sources. First, dbt models should explicitly reference known downstream consumers (see ETL - New project checklist) - this embeds dashboard dependencies directly in the transformation code. Second, the Data Catalog should ingest metadata from reporting tools to capture dashboards that are not explicitly documented in dbt. Tools like DataHub support connectors for Looker, Metabase, and others that extract this lineage automatically. Together, these provide a complete picture: dbt captures what we know and govern, the catalog captures what exists in the wild.
Cost reporting - The Lake Engine is the primary source for cost data. BigQuery's INFORMATION_SCHEMA.JOBS view exposes query costs, and with proper query tagging (point 1 above), these costs can be attributed to specific dashboards. With lineage in place (point 2 above), we can trace high-cost queries back through transformations to their source tables and forward to the dashboards that triggered them. The cost reporting dashboard surfaces the top cost drivers - which reports, which tables, which users - and makes this visible to data stewards and dashboard owners.

When a dashboard appears in the top-10 cost list, the owner should be notified. The catalog provides the context (what tables, what lineage), and the cost report provides the signal.

# Query Optimization

Beyond visibility, the platform should encourage - and eventually enforce - good query patterns:

Target Gold-layer tables - Dashboards should not query Bronze or Silver directly. Gold tables are modeled, partitioned, and clustered for reporting patterns.
Use materialized views - For expensive aggregations that are queried frequently, consider materialized views or pre-aggregated tables.
Leverage BI Engine - For high-frequency dashboards with predictable query patterns, BigQuery BI Engine provides in-memory acceleration with stable cost. See Lake Engine - Provisioned Capacity for guidance on when this applies.

# Prevention Mechanisms

Catching cost problems after they occur is necessary but insufficient. The platform should also prevent the creation of expensive artifacts:

Schema enforcement - If dashboards can only access datasets in the governed Gold layer, the worst-case query cost is bounded by the design of that layer.
Query quotas - BigQuery supports per-user and per-project quotas. Consider applying quotas to reporting service accounts to cap runaway costs.
Review gates - For dashboards that will be widely shared or scheduled, require review by a data steward before publication.

# Access Control and Security

Security in reporting is not just about who can see a dashboard - it is about who can see which rows and columns within the data that dashboard exposes.

# Row-Level Security

Different users need different views of the same data. For example:

A collections agent needs to see phone numbers; a credit analyst does not.
A retailer should see only their own customers; UME leadership sees all.

Row-level security (RLS) should be implemented at the data layer (BigQuery, AlloyDB) rather than in the reporting tool. This ensures that regardless of which tool queries the data, the same policies apply.

BigQuery supports RLS through authorized views and data masking policies. When combined with identity federation, queries inherit the user's identity and the appropriate filters are applied automatically.

# Identity Federation

Reporting tools should authenticate users through Google Workspace SSO and propagate that identity to BigQuery. This enables:

Per-user audit trails (who queried what, when)
Per-user cost attribution
Policy enforcement based on user groups

Avoid shared service accounts for reporting. When a service account is used, all queries appear as the same identity, destroying audit granularity and enabling policy circumvention.

# When Identity Federation Is Not Possible

Some tools - Metabase in particular - do not support propagating end-user identity to BigQuery. In these cases, compensating controls are required:

Tiered service accounts - Create separate service accounts with different permission levels (e.g., metabase-general@, metabase-sensitive@, metabase-pii@). Each account has access only to datasets appropriate for its tier. Configure multiple database connections in Metabase, one per service account.
Pre-filtered views in BigQuery - For datasets that require row-level filtering (e.g., tenant isolation), create dedicated views in BigQuery that apply the appropriate filters. The service account accesses only these views, not the underlying tables. This shifts RLS enforcement from query-time identity to data modeling.
Gold-layer isolation - Design Gold-layer tables with access boundaries in mind. If a user needs to join data across sensitivity levels, that join should happen in the ETL layer (dbt), producing a purpose-built dataset - not at query time in Metabase. This keeps sensitive data out of the reporting tool entirely.
Metabase-level permissions - Use Metabase's native collection and dashboard permissions to control who can access which artifacts. This does not replace data-layer security but adds a UI-level gate.
Audit at the tool level - Enable and monitor Metabase's internal audit logs. While BigQuery sees only the service account, Metabase records which user ran which query. Export these logs to a central location for correlation.
Query tagging for attribution - Even without per-user identity, configure Metabase to tag queries with artifact identifiers (dashboard ID, question ID). This enables cost and usage attribution at the artifact level, which is often sufficient for optimization work.

The key principle: when identity cannot flow through to the data layer, push security decisions earlier - into the data model (filtered views, purpose-built Gold tables) rather than relying on runtime policy enforcement.

# PII Considerations

Dashboards that expose personally identifiable information require additional controls:

Tag PII columns in the Data Catalog
Apply column-level masking in BigQuery for users who do not need raw PII
Audit access to PII-containing dashboards and alert on anomalies

# Ownership and Lineage

Every dashboard and report should have a clear owner - a person accountable for its accuracy, relevance, and cost. Ownership is not about blame; it is about having someone to ask when questions arise.

# Capturing Ownership

Define ownership at creation time. When a dashboard is published, require the creator to designate an owner (which may be themselves or a team).
Store ownership metadata in the Data Catalog. DataHub and similar tools support custom properties on assets, including ownership fields.
Display ownership prominently. Users viewing a dashboard should see who owns it and how to contact them.

# Lineage Visibility

The data catalog should provide end-to-end lineage: from source systems, through transformations, to the reports that consume the final data. This enables:

Impact analysis - Before changing a dbt model, understand which dashboards depend on it.
Root cause analysis - When a dashboard shows unexpected values, trace back to the source to identify where the issue originated.
Deprecation planning - When sunsetting a data source, identify all downstream consumers that need migration.

To achieve this, the catalog must ingest lineage from multiple sources:

ETL lineage - dbt emits manifest files with model dependencies. Ingest these into the catalog.
Reporting lineage - Use connectors to extract which tables each dashboard queries. DataHub provides connectors for Looker, Metabase, and can be extended for others.
Manual enrichment - For cases where automated lineage is incomplete, allow data stewards to manually link assets.

The result is a unified graph where a user can click on a dashboard and see everything upstream (sources, transformations) and everything downstream (other reports, reverse ETL destinations).

# Self-Service Analytics with Guardrails

Self-service is valuable, but unconstrained self-service leads to sprawl. The goal is to enable exploration while channeling it toward governed assets.

# The Catalog as Starting Point

When a user wants to answer a question with data, their first stop should be the Data Catalog. The catalog shows:

What data exists and what it means
Who owns it and how fresh it is
Whether it is certified for production use
How to access it

If users discover data through the catalog, they are more likely to use governed datasets. If they discover data by poking around in BigQuery or copying a colleague's query, they are more likely to create ungoverned dependencies.

# Deprecation and Cleanup

Self-service inevitably creates artifacts that outlive their usefulness. The platform should support graceful deprecation:

Usage tracking - Identify dashboards and saved queries that have not been accessed in 6 or 12 months.
Deprecation workflow - Notify owners of stale artifacts. If no action is taken, move to a legacy area. If still unused, delete.
Legacy visibility - Make it clear in the catalog which assets are deprecated. Users can still access them if needed, but they are warned.

This is not about restricting users - it is about maintaining a clean, trustworthy inventory of data assets.

# KPI Lifecycle and Certification

Not all data is created equal. A Gold-layer table that has been reviewed, tested, and documented should be treated differently than an ad-hoc query someone wrote last week.

# Certification Levels

Consider a simple certification model:

Level	Meaning	Criteria
Experimental	Work in progress, not for production use	Default state for new artifacts
Validated	Passes automated tests, reviewed by data team	Tests pass, owner assigned, documentation exists
Certified	Approved for business-critical use	Validated + reviewed by data steward + stakeholder sign-off

Certification status should be visible in the catalog and, ideally, in the reporting tool. Users should know whether the data they are looking at is trusted.

# KPI Versioning

Business definitions change. The way FPD is calculated in 2026 may differ from 2024. The platform should support:

Versioned KPI definitions - Historical queries can use historical definitions
Point-in-time snapshots - Data for a given period is frozen, enabling reproducibility
Audit trail - Changes to KPI definitions are logged and attributable

This ties back to Lake Engine - Governance Recommendations on modeling data and transformations to support KPI versioning alongside data.

# Governance Recommendations

Shift complexity left - KPIs belong in dbt models, not dashboard queries. Dashboards should be thin consumers.
Establish ownership - Every dashboard has an owner. Ownership is visible in the catalog and enforced at publication.
Capture lineage end-to-end - Ingest lineage from ETL tools and reporting tools into the catalog. Enable impact and root cause analysis.
Attribute cost to artifacts - Tag queries with dashboard identifiers. Build cost reports. Notify owners of high-cost artifacts.
Implement RLS at the data layer - Row-level security belongs in BigQuery, not in individual reporting tools.
Require identity federation - No shared service accounts for reporting. Per-user identity enables audit and policy enforcement.
Deprecate proactively - Track usage. Notify owners of stale artifacts. Clean up what is no longer needed.
Certify what matters - Distinguish experimental data from production-ready. Make certification visible to users.
Favor Looker Studio for MVPs - Leverage existing familiarity while governance foundations are established.
Evaluate Metabase Pro and Hex - Conduct focused PoCs to determine whether these tools meet security and governance requirements before committing to a migration path.

# Tasks

Configure query tagging in Looker Studio to enable BigQuery cost attribution.
Build cost attribution dashboard showing top-10 reports by cost, with links to catalog entries.
Define Metabase Pro evaluation criteria and conduct PoC.
Define Hex evaluation criteria and conduct PoC.
Establish RLS patterns for multi-tenant and PII scenarios in BigQuery.
Document certification workflow and integrate certification status into catalog.
Implement usage tracking for dashboards and define deprecation policy.
Create onboarding guide for analysts: how to discover data in the catalog, how to build compliant dashboards.