Principles forged by the demands of real-time, physical-world systems.
Multi-sided Marketplace
Real-time forecasting, dynamic pricing & matching across earners, consumers and merchants — all in milliseconds.
Hyper-local Geospatial
H3 hex grids, sub-second ETAs, real-time routing and location intelligence powering every trip on the planet.
Platform First
Modular building blocks that let us launch new verticals — rides, eats, freight, autonomous — on shared infrastructure.
Global Adaptability
70+ countries, thousands of regulatory frameworks, dozens of languages — one platform that adapts everywhere.
Resiliency & Scalability
Fault-tolerant systems serving millions of concurrent users. Five 9s is not aspirational, it is mandatory.
Engineered for the Real World
Stories of building the systems that move the world in real time.
BACKEND
From Static Rate-Limiting to Intelligent Load Management
Static quotas couldn't protect Uber's databases — serving 170M+ users at tens of millions of RPS — from real-world overload. Three generations evolved from CoDel queues to Cinnamon's tier-aware shedding to a unified "Bring Your Own Signal" framework. The result: 80% more throughput under overload, 70% lower P99 latency, and 93% fewer goroutines.
How Uber Executed a JUnit Migration at Massive Scale
600,000+ JUnit 4 tests across 15M lines of Java with no path forward. Uber automated the JUnit 5 migration using OpenRewrite's Lossless Semantic Trees driven by a custom Bazel aspect — parsing source, applying recipes, and generating CI-validated patches. In 4 months: 75,000+ test classes migrated, 1.25M lines modified, 5,000+ automated diffs.
Building High Throughput Payment Account Processing
A 3–4 ops/sec ceiling on payment accounts left fleet operators waiting 21–24 hours per day's transactions. A three-service batching architecture — Redis-windowed Batch Creator, in-memory Batch Process with single atomic write, and async Post-Processor for audit logs — pushed throughput 10x to 30+ ops/sec, collapsing processing from hours to minutes.
80M requests per second across 1,100+ services, each running its own fragmented Redis-based limiter. Uber unified everything into a single in-mesh implementation using probabilistic token dropping — eliminating coordination overhead and delivering a 90% reduction in P99 tail latency for overloaded services.
Kafka's consumer model forces every service to own partition management, offsets, and rebalances. uForwarder flips it: a consumer proxy fetches messages and delivers them to your gRPC endpoint, letting services forget Kafka exists. Now powering 1,000+ consumer services with centralized observability and backpressure control.
Evolution and Scale of Uber's Delivery Search Platform
Lexical search broke on Uber Eats — failing on synonyms, typos, and multilingual queries like "pan" (bread or cookware?). The new semantic stack: a two-tower deep network with a Qwen LLM backbone, trained via DeepSpeed ZeRO-3. Matryoshka embeddings cut storage 50% with under 0.3% quality loss; scalar quantization halves latency.
Transforming Ads Personalization with Sequential Modeling and Hetero-MMoE
Uber's ads system flattened rich behavioral sequences into summary stats and used MLP-only experts that missed cross-feature interactions. A target-aware transformer with Multi-Head Latent Attention captures sequences at O(N×L), and a Hetero-MMoE blends MLP, DCN, and CIN experts. Production gains: +0.93% pCTR AUC and +0.66% pCTO AUC.
Apache Hudi: Trillion-Record-Scale Data Lake Operations
Born at Uber to solve mutating immutable data lake files at scale, Apache Hudi now powers 19,500 datasets processing 6 trillion daily rows and 10 PB of ingestion — the storage engine making near-real-time lakehouse operations possible without sacrificing object storage's cost advantages.
16,000 datasets in one monolithic Hive metastore meant shared-fate blast radius — one bad operation could affect every team. Uber decomposed it into domain-specific databases via pointer-level metadata manipulation, achieving zero-downtime migration. Result: over 1 PB saved and the organizational independence teams needed to own their data contracts.
Semantic search across 1.5 billion items needed ANN algorithms fast enough for production yet accurate enough to drive recommendations. Uber tuned HNSW parameters, optimized segment merging, and added quantization — cutting ingestion from 12 hours to 2.5 hours and slashing P99 query latency from 250ms to under 120ms.
From Batch to Streaming: Accelerating Data Freshness
Batch ingestion meant analytical decisions lagged reality by hours across Delivery, Mobility, Finance, and Marketing. Re-architecting around Apache Flink with row-group-level Parquet merging brought freshness down to minutes while cutting compute 25% — streaming's incremental approach avoids the full-file rewrites that made batch so expensive at petabyte scale.
Live Activities run sandboxed with no network access — yet Uber needed real-time driver location on the lock screen. App Groups share on-disk state between the main app and Live Activity, a lightweight DSL syncs content logic across iOS Live Activities and Android push, and an OOA backend debounces updates. Result: 2.26% fewer driver and 2.13% fewer rider cancellations at pickup.
Every Uber line of business had built payments independently — duplicated logic, inconsistent UX, Apple Pay missing from half the flows. EU Strong Customer Authentication forced a reckoning. Uber built a centralized checkout orchestrator with modular components each LOB plugs into. Holdout results: 3% higher conversion, 4.5% better session recovery, hundreds of millions in incremental gross bookings.
Standardized Mobile Analytics for Cross-Platform Insights
Over 40% of Uber's mobile events were ad-hoc custom logs, breaking cross-platform analysis and impression accuracy. The team standardized to three universal event types — tap, impression, scroll — using AnalyticsBuilder classes that capture metadata at the platform layer. Result: 30% less transient-impression noise and reliable iOS/Android parity.
Uber's Rider app launches features across hundreds of screens and thousands of feature flags — making manual design system audits impossible. Design System Observability adds a deterministic component scanner that flags non-Base elements, plus a daily screenshot pipeline that auto-files Jira tickets for violations. Teams using Base report 3x faster development and 50% less code.
Letting executive assistants book rides for executives meant rethinking trip ownership, identity, billing, and notifications across 30+ backend services and 5 client platforms. Uber introduced a "participant model" extending every booking, tracking, and billing touchpoint to support multiple user profiles per trip — with full audit trails for both EA and executive.
How Uber Built an Agentic System to Automate Design Specs in Minutes
Manual design specs across UIKit, SwiftUI, Android XML, Compose, Web React, Go, and SDUI were a bottleneck causing constant documentation drift. uSpec combines AI agents with a Figma Console MCP bridge that reads real tokens and variants directly from Figma — running locally via Cursor over WebSocket. Screen reader specs across all 3 platforms now generate in under 2 minutes.
AI Prototyping Is Changing How We Build Products at Uber
Cross-functional alignment used to take weeks of meetings and PRDs. AI prototyping tools — Lovable, Figma Make, Claude Code, Cursor — compressed a merchant team's four-week discussion into two hours, and let a PM explore six concepts in 20 minutes. Nearly 40% of Uber's global hackathon submissions now incorporate these tools.
Open Source and In-House: How Uber Optimizes LLM Training
Training LLMs for Eats, support, and code gen at Uber means squeezing every GPU cycle. The stack: PyTorch, Ray, DeepSpeed ZeRO-3 CPU Offload (34% memory reduction, 2-7x batch sizes), and Flash Attention (50% memory savings). On H100, Mixtral 8x7b achieves 3x A100 throughput, scaling linearly to batch 64.
Innovative Recommendation Applications Using Two Tower Embeddings
Uber's restaurant retrieval ran thousands of city-specific Spark jobs weekly — a model that couldn't scale globally. Two-Tower Embeddings collapse it into one global model: query and item towers, Bag-of-Words history shrinking the model 20x, and LogQ correction pushing recall@500 from 89% to 93%. Now serves hundreds of millions of eaters at ~100ms latency.
As AI proliferates across Uber, knowing which model does what — and who owns it when something breaks — becomes mission-critical. Uber built a five-pillar program: a Model Catalog with standardized Model Cards, SHAP/PFI/integrated gradients in Michelangelo, compliance checks embedded in design workflows from day one, plus structured Education and Adoption.
Accelerating Deep Learning: Uber Optimized Petastorm for GPU Training
Uber's ML teams trained on tens of TBs with GPUs idle 85–90% of the time, waiting on serialized data loading. Two Petastorm fixes — pushing PyArrow-to-NumPy conversions into a parallel worker pool, plus FanoutCache for local disk — pushed GPU utilization from 10–15% to 60%+, training time from 22h to 3h (7.3x), and cut compute costs ~80%.
~65,000 weekly code changes are too many for humans to fully review — but noisy AI tools are worse than none. uReview uses multi-stage GenAI with aggressive confidence scoring to analyze 90% of changes at a 75% usefulness rate, surfacing only comments that matter. Saves an estimated 1,500 developer hours per week.
Superuser Gateway: Guardrails for Privileged Command Execution
A misplaced flag in a privileged `rm -r` could silently delete a production dataset with no audit trail. Superuser Gateway removes superuser credentials from engineers' machines entirely, routing privileged commands through a Git-backed PR workflow with CLI submission, CI validation, peer approval, and controlled remote execution. Now standard for all data platform admins.
150,000 secrets across 25 fragmented vaults — no centralized detection, rotation, or attribution. Uber consolidated into 6 managed vaults, deployed real-time scanning across git/Slack/CI, and built a Cadence-orchestrated Secret Lifecycle Manager. A team of 10 now drives 20,000 automated monthly rotations with 90% fewer secrets exposed in pipelines.
An accidental IAM change on a critical gateway once stopped Uber Eats customers from modifying orders — and ~10% of monthly policy changes involve risky privilege removal. The Policy Simulator pulls 30–90 days of access logs from Apache Pinot and replays them through current vs. proposed policies. Cadence-orchestrated, sub-minute impact analysis before anything ships.
Rotating 100,000+ Kerberos keytabs is risky: rotation invalidates the previous key immediately, leaving applications without valid credentials. Uber's solution generates keytabs with both old and new versions during transitions, drops fetch intervals to 30s, and integrates with the Secret Management Platform. Now rotates 30,000+ keytabs monthly with zero disruptions.
Security for Hadoop Data Lake on Google Cloud Storage
Migrating 160+ PB from HDFS to GCS meant bridging Kerberos delegation tokens and GCP OAuth 2.0 — without changing any of thousands of analytical jobs. The Storage Access Service intercepts FileSystem calls, exchanges Hadoop tokens for time-bound GCP credentials, and caches across three layers. Handles 500,000+ RPS at 0.026ms average latency.
Manually tagging sensitive data columns at exabyte scale was impractical — yet classification is the foundation of privacy controls and encryption. DataK9 uses a hybrid approach: experts manually classify under 1% of datasets as golden examples, then rule-based Bloom filters and ML-trained Linear SVMs auto-tag the remaining 400,000+. Required to exceed 90% accuracy and 85% F2-score before activation.