# Pending meeting notes

Business

  • UME is evolving from BNPL operator to multi-issuer technology provider.
  • The vision is to become "Visa 2.0 or Mastercard 2.0" - a payment network.
  • use building blocks approach. transforming products into capability boxes that retailers can customize.
  • wallet strategy: position UME as a wallet for end users, centralizing all credit cycles from different issuers.
  • Benefits ecosystem: plan to offer rewards, discounts, and membership packages to incentivize UME app usage over white labels.

Infra details

  • CDC (Change Data Capture) runs every 15 minutes from transactional databases to BigQuery
  • 19 transactional DBs from oltp systems
  • 4-5 separate BigQuery projects
  • Streamlit is used by Data Science and Credit teams for dashboards (mostly local, one deployed on Cloud Run)
  • Dataplex is being used by Wagner for cataloging acceleration, using LLM to catalog all data - early stage

Data quality

  • Data consistency is more important than speed or recency in credit business - users need to understand this concept
  • Small errors (like two decimal places) can have big effects on credit policy decisions
  • Golden Truth concept: need to define standard sources and deprecate old ones
  • Trust in numbers is the biggest challenge - not cataloging or organizing
    • e.g.: R$ 5 million divergence found between collections and credit teams
  • 700 tables need to be mapped to determine which are reliable
  • Concept of Books of variables -- creating source of truth for contracts, borrowers, retailers, etc

Cost management

  • BigQuery costs are increasing exponentially due to bad queries and unoptimized dashboards
    • e.g.: one dashboard costing 15.000BRL, reduced to 700 after optimization.
  • Top 10 cost items analysis needed (Leo working on this)
  • Need prevention mechanisms to avoid future cost increases
  • Cost projection needed for next year (best and worst case scenarios?)
  • BigQuery being misused for transactional operations: 2 seconds per request, expensive, legacy data updated every 15 minutes

Tool evaluations and discussions

  • Tectile: infrastructure simplification tool, versioning, backtests, ETL jobs, good AI integration, low code + code mix.
  • hex.tech: Jupyter-like notebook with dashboards, alerts, AI integration.
    • Could replace Looker/Metabase for dashboards and serve as DS analysis platform. Does well on management? Integrates with governance?
  • briefer: national competitor to Hex/Rex.
  • DataBricks: excellent usability but exponential costs. DBU (processing unit) + cloud costs (double billing). Based on Spark architecture.
  • Maybe use PostgreSQL: simpler and cheaper for most aggregated data cases, since base is not that large
    • Management overhead? Optimization? Pet-db?
  • Reverse ETL: need to work on moving data to non-analytical sources. BigQuery is expensive for individual record lookups.

Metabase specific issues

  • Old image that's hard to update (stuck in time) - security risk
  • users have almost total power, can do SQL injection [?]
  • Queries exposed in URL (base64 encoded)
    • how does that even fit in the header? possible config override
  • Initial policy was full access for everyone
  • Users prefer Metabase because they can access everything, even when they don't need it (e.g.: phone numbers).

Access and governance

  • Today users have access to bronze, silver, gold layers and everything outside the pipeline
    • All data belong to these tiers?
  • This leads users to create new things outside the transformed data pipeline
    • We need mechanisms to prevent/lock
  • Need row level security for PII protection
    • e.g.: Credit team doesn't need phone numbers, but collections does.
  • Need to segregate AI consumption by user for monitoring.

AI and automation

  • wagner directing 70% of workflow to LLMs (Cloud Code, ChatGPT API)
  • useing LLMs for for templates, boilerplate, validation tasks
  • vertex AI infra having Anthropic Sonet and Gemini 3.
  • meed cost segregation for AI usage per user.
  • LLMs used to compare SQL code with Polars/Python code for validation between teams
    • Source of divergence
    • Need data testing to catch edge cases

Data scale

  • 7 million active contracts on active wallet
  • avg 4-5 installments per contract - small
  • Some dashboard break down by contracts and installments - around 10 million records.

Process and methodology

  • stack talks - gittalks? ceremony to teach team about new data patterns and governance
  • Need to discontinue bad old patterns
  • idea: small vertical scope approach - start with one area or kpi (like FPD) to validate changes
  • Automate transition to legacy
    • e.g.: establish policies for transitioning data to archival
  • Use Metabase analytics to identify what is accessed daily vs. not used in 12 months
    • Make policies from usage
    • sweep metabase for queries, contextualize that
      • dash usage > queries > bq tables

Engineering preferences

  • prefer native GCP tools over open source or "strange things".
  • focus engineering time on core business (platform, financial product customization), not infrastructure work
  • Use third-party services for non-core things (e.g.: authentication, load balancing).
  • Avoid wasting time on areas that don't generate business leverage