Driving innovation at scale

Powering 41 million daily connections between the digital and physical worlds through a real-time marketplace built for global reliability.

0 concurrent trips & orders
0 requests/sec across microservices
0 ETA predictions served per day
0 ML models in production

How We Build

Principles forged by the demands of real-time, physical-world systems.

Multi-sided Marketplace

Real-time forecasting, dynamic pricing & matching across earners, consumers and merchants — all in milliseconds.

Hyper-local Geospatial

H3 hex grids, sub-second ETAs, real-time routing and location intelligence powering every trip on the planet.

Platform First

Modular building blocks that let us launch new verticals — rides, eats, freight, autonomous — on shared infrastructure.

Global Adaptability

70+ countries, thousands of regulatory frameworks, dozens of languages — one platform that adapts everywhere.

Resiliency & Scalability

Fault-tolerant systems serving millions of concurrent users. Five 9s is not aspirational, it is mandatory.

Engineered for the Real World

Stories of building the systems that move the world in real time.

Zero-Growth Stack in Go
BACKEND

Zero-Growth Stack, Real Gains: How Stack Allocation Can Save 10% CPU in Go

Go's default 2KB stack allocation triggered wasteful expansion cycles consuming 10% CPU in Uber's high-throughput services. The team built a profile-guided automation system reading CPU profiles and binary symbols to compute optimal per-service stack sizes. Result: CPU overhead dropped below 1%, enabling 16% efficiency gains while keeping memory overhead under 2% across 2M cores.

Read more →
Cloud Commitment Layers
BACKEND

The 5 Layers Every Cloud Commitment Depends On

Uber's decade-long cloud commitment — a distributed systems problem spanning 10-figure spend — required mastering five physical layers. The challenge: regional topology constraints, inelastic power limits, and egress economics compound silently into cost variances. The solution: systematic qualification of compute, SKU selection, and network topology upfront. Impact: eliminating technical debt before it scales; enabling portability across zones and regions in minutes.

Read more →
Ansible Network Automation
BACKEND

How Ansible Automation Powers the Uber Corporate Network at a Global Scale

Manual management of 5,000 network devices across six continents created unstable configurations and constant downtime. Uber implemented Ansible-driven "Daily Nightly Enforcement" — automating backups, golden config generation via Jinja templates, and standardized pushes across multi-vendor hardware. Unified network state across regions, eliminated untracked manual changes overnight, and transformed a chaotic distributed system into reliably predictable infrastructure.

Read more →
Hybrid Core Allocation
BACKEND

Hybrid Core Allocation: From Overallocation to Reliable Sharing

Odin's vertical CPU scaler struggled with bursty workload patterns under pure dedicated core allocation. Uber engineers introduced a hybrid model combining dedicated cores for baseline performance with shared cores for elastic bursts, managed via Linux cpusets and cpu.shares. Result: reduced overprovisioning for variable workloads, increased fleet-wide CPU utilization, and maintained service-level guarantees without cross-host workload migration.

Read more →
Intelligent Load Management
BACKEND

From Static Rate-Limiting to Intelligent Load Management

Static quotas couldn't protect Uber's databases — serving 170M+ users at tens of millions of RPS — from real-world overload. Three generations evolved from CoDel queues to Cinnamon's tier-aware shedding to a unified "Bring Your Own Signal" framework. The result: 80% more throughput under overload, 70% lower P99 latency, and 93% fewer goroutines.

Read more →
gRPC in OpenSearch
BACKEND

Accelerating Search and Ingestion with High-Performance gRPC in OpenSearch

OpenSearch's REST/JSON-only interface forced translation layers in Uber's gRPC-native stack. Uber contributed native gRPC endpoints for Search and Bulk operations with automated Protobuf-JSON schema sync — eliminating adapter services. The impact: 60% p99 write latency reduction for M3, 53% lower p50 search latency for Delivery, and up to 47% faster throughput.

Read more →
Bounding Box Annotation Validation
DATA

Validating Bounding Box Annotations

Human annotators introduce tracking errors — ID swaps, position jumps, freeze errors, scale distortions — when labeling video bounding boxes. Uber deployed an XGBoost classifier analyzing 283 visual, motion, and coordinate features across an 11-frame window to catch these failures automatically. The system flags errors in real-time before submission, eliminating costly sequential review workflows while preserving data integrity.

Read more →
Delivery Search Platform
DATA

Evolution and Scale of Uber's Delivery Search Platform

Lexical search broke on Uber Eats — failing on synonyms, typos, and multilingual queries like "pan" (bread or cookware?). The new semantic stack: a two-tower deep network with a Qwen LLM backbone, trained via DeepSpeed ZeRO-3. Matryoshka embeddings cut storage 50% with under 0.3% quality loss; scalar quantization halves latency.

Read more →
Ads Personalization
DATA

Transforming Ads Personalization with Sequential Modeling and Hetero-MMoE

Uber's ads system flattened rich behavioral sequences into summary stats and used MLP-only experts that missed cross-feature interactions. A target-aware transformer with Multi-Head Latent Attention captures sequences at O(N×L), and a Hetero-MMoE blends MLP, DCN, and CIN experts. Production gains: +0.93% pCTR AUC and +0.66% pCTO AUC.

Read more →
Database Federation
DATA

Database Federation: Decentralized Hive Databases

16,000 datasets in one monolithic Hive metastore meant shared-fate blast radius — one bad operation could affect every team. Uber decomposed it into domain-specific databases via pointer-level metadata manipulation, achieving zero-downtime migration. Result: over 1 PB saved and the organizational independence teams needed to own their data contracts.

Read more →
Scaled Data Replication
DATA

How Uber Scaled Data Replication to Move Petabytes Every Day

Petabyte-scale data replication across on-premise and cloud infrastructure created bottlenecks: HDFS client contention blocked job submissions, and sequential file merging delayed commits by 30+ minutes. Uber shifted Copy Listing to distributed Application Masters, parallelized namenode calls, and enabled Uber jobs for small transfers. Result: 90% faster submissions, 97% lower merge latency, and 5× incremental replication capacity within a year.

Read more →
Apache Hudi
DATA

Apache Hudi: Trillion-Record-Scale Data Lake Operations

Born at Uber to solve mutating immutable data lake files at scale, Apache Hudi now powers 19,500 datasets processing 6 trillion daily rows and 10 PB of ingestion — the storage engine making near-real-time lakehouse operations possible without sacrificing object storage's cost advantages.

Read more →
Design Specs Automation
MOBILE

How Uber Built an Agentic System to Automate Design Specs in Minutes

Manual design specs across UIKit, SwiftUI, Android XML, Compose, Web React, Go, and SDUI were a bottleneck causing constant documentation drift. uSpec combines AI agents with a Figma Console MCP bridge that reads real tokens and variants directly from Figma — running locally via Cursor over WebSocket. Screen reader specs across all 3 platforms now generate in under 2 minutes.

Read more →
Mobile Analytics
MOBILE

Standardized Mobile Analytics for Cross-Platform Insights

Over 40% of Uber's mobile events were ad-hoc custom logs, breaking cross-platform analysis and impression accuracy. The team standardized to three universal event types — tap, impression, scroll — using AnalyticsBuilder classes that capture metadata at the platform layer. Result: 30% less transient-impression noise and reliable iOS/Android parity.

Read more →
Unified Checkout
MOBILE

Unified Checkout: Streamlining Uber's Payment Ecosystem

Every Uber line of business had built payments independently — duplicated logic, inconsistent UX, Apple Pay missing from half the flows. EU Strong Customer Authentication forced a reckoning. Uber built a centralized checkout orchestrator with modular components each LOB plugs into. Holdout results: 3% higher conversion, 4.5% better session recovery, hundreds of millions in incremental gross bookings.

Read more →
Design System at Scale
MOBILE

Measuring Design System Adoption at Scale

Uber's Rider app launches features across hundreds of screens and thousands of feature flags — making manual design system audits impossible. Design System Observability adds a deterministic component scanner that flags non-Base elements, plus a daily screenshot pipeline that auto-files Jira tickets for violations. Teams using Base report 3x faster development and 50% less code.

Read more →
Delegate Booking
MOBILE

Transforming Executive Travel: Delegate Booking

Letting executive assistants book rides for executives meant rethinking trip ownership, identity, billing, and notifications across 30+ backend services and 5 client platforms. Uber introduced a "participant model" extending every booking, tracking, and billing touchpoint to support multiple user profiles per trip — with full audit trails for both EA and executive.

Read more →
Uber Live Activity on iOS
MOBILE

Uber's Live Activity on iOS

Live Activities run sandboxed with no network access — yet Uber needed real-time driver location on the lock screen. App Groups share on-disk state between the main app and Live Activity, a lightweight DSL syncs content logic across iOS Live Activities and Android push, and an OOA backend debounces updates. Result: 2.26% fewer driver and 2.13% fewer rider cancellations at pickup.

Read more →
AI PRD Reviewer
AI / ML

Lessons from Building a First-Pass AI PRD Reviewer at Uber

Product review processes surfaced critical gaps too late, forcing teams into costly rework on unsupported assumptions and blind dependencies. Uber built an AI-powered PRD Evaluator that assembles contextual knowledge across linked documents, prior experiments, and company-specific frameworks to audit proposals before high-level review. Early adoption by dozens of PMs revealed stronger artifact quality entering checkpoints and sharper, faster review discussions.

Read more →
AI Prototyping
AI / ML

AI Prototyping Is Changing How We Build Products at Uber

Cross-functional alignment used to take weeks of meetings and PRDs. AI prototyping tools — Lovable, Figma Make, Claude Code, Cursor — compressed a merchant team's four-week discussion into two hours, and let a PM explore six concepts in 20 minutes. Nearly 40% of Uber's global hackathon submissions now incorporate these tools.

Read more →
LLM Training
AI / ML

Open Source and In-House: How Uber Optimizes LLM Training

Training LLMs for Eats, support, and code gen at Uber means squeezing every GPU cycle. The stack: PyTorch, Ray, DeepSpeed ZeRO-3 CPU Offload (34% memory reduction, 2-7x batch sizes), and Flash Attention (50% memory savings). On H100, Mixtral 8x7b achieves 3x A100 throughput, scaling linearly to batch 64.

Read more →
Two Tower Embeddings
AI / ML

Innovative Recommendation Applications Using Two Tower Embeddings

Uber's restaurant retrieval ran thousands of city-specific Spark jobs weekly — a model that couldn't scale globally. Two-Tower Embeddings collapse it into one global model: query and item towers, Bag-of-Words history shrinking the model 20x, and LogQ correction pushing recall@500 from 89% to 93%. Now serves hundreds of millions of eaters at ~100ms latency.

Read more →
Scaling Responsible AI
AI / ML

Under the Hood: Scaling Responsible AI at Uber

As AI proliferates across Uber, knowing which model does what — and who owns it when something breaks — becomes mission-critical. Uber built a five-pillar program: a Model Catalog with standardized Model Cards, SHAP/PFI/integrated gradients in Michelangelo, compliance checks embedded in design workflows from day one, plus structured Education and Adoption.

Read more →
Petastorm Deep Learning
AI / ML

Accelerating Deep Learning: Uber Optimized Petastorm for GPU Training

Uber's ML teams trained on tens of TBs with GPUs idle 85–90% of the time, waiting on serialized data loading. Two Petastorm fixes — pushing PyArrow-to-NumPy conversions into a parallel worker pool, plus FanoutCache for local disk — pushed GPU utilization from 10–15% to 60%+, training time from 22h to 3h (7.3x), and cut compute costs ~80%.

Read more →
Superuser Gateway
SECURITY

Superuser Gateway: Guardrails for Privileged Command Execution

A misplaced flag in a privileged `rm -r` could silently delete a production dataset with no audit trail. Superuser Gateway removes superuser credentials from engineers' machines entirely, routing privileged commands through a Git-backed PR workflow with CLI submission, CI validation, peer approval, and controlled remote execution. Now standard for all data platform admins.

Read more →
IAM Policy Safety
SECURITY

Determinism and Safety in IAM Policy Changes

An accidental IAM change on a critical gateway once stopped Uber Eats customers from modifying orders — and ~10% of monthly policy changes involve risky privilege removal. The Policy Simulator pulls 30–90 days of access logs from Apache Pinot and replays them through current vs. proposed policies. Cadence-orchestrated, sub-minute impact analysis before anything ships.

Read more →
Kerberos Keytab Rotation
SECURITY

Automating Kerberos Keytab Rotation at Uber

Rotating 100,000+ Kerberos keytabs is risky: rotation invalidates the previous key immediately, leaving applications without valid credentials. Uber's solution generates keytabs with both old and new versions during transitions, drops fetch intervals to 30s, and integrates with the Secret Management Platform. Now rotates 30,000+ keytabs monthly with zero disruptions.

Read more →
Secrets Management
SECURITY

Multi-Cloud Secrets Management Platform

150,000 secrets across 25 fragmented vaults — no centralized detection, rotation, or attribution. Uber consolidated into 6 managed vaults, deployed real-time scanning across git/Slack/CI, and built a Cadence-orchestrated Secret Lifecycle Manager. A team of 10 now drives 20,000 automated monthly rotations with 90% fewer secrets exposed in pipelines.

Read more →
Hadoop GCS Security
SECURITY

Security for Hadoop Data Lake on Google Cloud Storage

Migrating 160+ PB from HDFS to GCS meant bridging Kerberos delegation tokens and GCP OAuth 2.0 — without changing any of thousands of analytical jobs. The Storage Access Service intercepts FileSystem calls, exchanges Hadoop tokens for time-bound GCP credentials, and caches across three layers. Handles 500,000+ RPS at 0.026ms average latency.

Read more →
DataK9
SECURITY

DataK9: Auto-Categorizing an Exabyte of Data

Manually tagging sensitive data columns at exabyte scale was impractical — yet classification is the foundation of privacy controls and encryption. DataK9 uses a hybrid approach: experts manually classify under 1% of datasets as golden examples, then rule-based Bloom filters and ML-trained Linear SVMs auto-tag the remaining 400,000+. Required to exceed 90% accuracy and 85% F2-score before activation.

Read more →

News

The latest from Uber Engineering and beyond.

Research

Browse research and articles authored by our team members.

Open Source

Infrastructure for connecting the digital and physical worlds -- open to everyone.

Engineering Domains

Where the digital-physical bridge is built, one domain at a time.