Powering 41 million daily connections between the digital and physical worlds through a real-time marketplace built for global reliability.
Principles forged by the demands of real-time, physical-world systems.
Real-time forecasting, dynamic pricing & matching across earners, consumers and merchants — all in milliseconds.
H3 hex grids, sub-second ETAs, real-time routing and location intelligence powering every trip on the planet.
Modular building blocks that let us launch new verticals — rides, eats, freight, autonomous — on shared infrastructure.
70+ countries, thousands of regulatory frameworks, dozens of languages — one platform that adapts everywhere.
Fault-tolerant systems serving millions of concurrent users. Five 9s is not aspirational, it is mandatory.
Stories of building the systems that move the world in real time.
Go's default 2KB stack allocation triggered wasteful expansion cycles consuming 10% CPU in Uber's high-throughput services. The team built a profile-guided automation system reading CPU profiles and binary symbols to compute optimal per-service stack sizes. Result: CPU overhead dropped below 1%, enabling 16% efficiency gains while keeping memory overhead under 2% across 2M cores.
Read more →Uber's decade-long cloud commitment — a distributed systems problem spanning 10-figure spend — required mastering five physical layers. The challenge: regional topology constraints, inelastic power limits, and egress economics compound silently into cost variances. The solution: systematic qualification of compute, SKU selection, and network topology upfront. Impact: eliminating technical debt before it scales; enabling portability across zones and regions in minutes.
Read more →Manual management of 5,000 network devices across six continents created unstable configurations and constant downtime. Uber implemented Ansible-driven "Daily Nightly Enforcement" — automating backups, golden config generation via Jinja templates, and standardized pushes across multi-vendor hardware. Unified network state across regions, eliminated untracked manual changes overnight, and transformed a chaotic distributed system into reliably predictable infrastructure.
Read more →Odin's vertical CPU scaler struggled with bursty workload patterns under pure dedicated core allocation. Uber engineers introduced a hybrid model combining dedicated cores for baseline performance with shared cores for elastic bursts, managed via Linux cpusets and cpu.shares. Result: reduced overprovisioning for variable workloads, increased fleet-wide CPU utilization, and maintained service-level guarantees without cross-host workload migration.
Read more →Static quotas couldn't protect Uber's databases — serving 170M+ users at tens of millions of RPS — from real-world overload. Three generations evolved from CoDel queues to Cinnamon's tier-aware shedding to a unified "Bring Your Own Signal" framework. The result: 80% more throughput under overload, 70% lower P99 latency, and 93% fewer goroutines.
Read more →OpenSearch's REST/JSON-only interface forced translation layers in Uber's gRPC-native stack. Uber contributed native gRPC endpoints for Search and Bulk operations with automated Protobuf-JSON schema sync — eliminating adapter services. The impact: 60% p99 write latency reduction for M3, 53% lower p50 search latency for Delivery, and up to 47% faster throughput.
Read more →Human annotators introduce tracking errors — ID swaps, position jumps, freeze errors, scale distortions — when labeling video bounding boxes. Uber deployed an XGBoost classifier analyzing 283 visual, motion, and coordinate features across an 11-frame window to catch these failures automatically. The system flags errors in real-time before submission, eliminating costly sequential review workflows while preserving data integrity.
Read more →Lexical search broke on Uber Eats — failing on synonyms, typos, and multilingual queries like "pan" (bread or cookware?). The new semantic stack: a two-tower deep network with a Qwen LLM backbone, trained via DeepSpeed ZeRO-3. Matryoshka embeddings cut storage 50% with under 0.3% quality loss; scalar quantization halves latency.
Read more →Uber's ads system flattened rich behavioral sequences into summary stats and used MLP-only experts that missed cross-feature interactions. A target-aware transformer with Multi-Head Latent Attention captures sequences at O(N×L), and a Hetero-MMoE blends MLP, DCN, and CIN experts. Production gains: +0.93% pCTR AUC and +0.66% pCTO AUC.
Read more →16,000 datasets in one monolithic Hive metastore meant shared-fate blast radius — one bad operation could affect every team. Uber decomposed it into domain-specific databases via pointer-level metadata manipulation, achieving zero-downtime migration. Result: over 1 PB saved and the organizational independence teams needed to own their data contracts.
Read more →Petabyte-scale data replication across on-premise and cloud infrastructure created bottlenecks: HDFS client contention blocked job submissions, and sequential file merging delayed commits by 30+ minutes. Uber shifted Copy Listing to distributed Application Masters, parallelized namenode calls, and enabled Uber jobs for small transfers. Result: 90% faster submissions, 97% lower merge latency, and 5× incremental replication capacity within a year.
Read more →Born at Uber to solve mutating immutable data lake files at scale, Apache Hudi now powers 19,500 datasets processing 6 trillion daily rows and 10 PB of ingestion — the storage engine making near-real-time lakehouse operations possible without sacrificing object storage's cost advantages.
Read more →Manual design specs across UIKit, SwiftUI, Android XML, Compose, Web React, Go, and SDUI were a bottleneck causing constant documentation drift. uSpec combines AI agents with a Figma Console MCP bridge that reads real tokens and variants directly from Figma — running locally via Cursor over WebSocket. Screen reader specs across all 3 platforms now generate in under 2 minutes.
Read more →Over 40% of Uber's mobile events were ad-hoc custom logs, breaking cross-platform analysis and impression accuracy. The team standardized to three universal event types — tap, impression, scroll — using AnalyticsBuilder classes that capture metadata at the platform layer. Result: 30% less transient-impression noise and reliable iOS/Android parity.
Read more →Every Uber line of business had built payments independently — duplicated logic, inconsistent UX, Apple Pay missing from half the flows. EU Strong Customer Authentication forced a reckoning. Uber built a centralized checkout orchestrator with modular components each LOB plugs into. Holdout results: 3% higher conversion, 4.5% better session recovery, hundreds of millions in incremental gross bookings.
Read more →Uber's Rider app launches features across hundreds of screens and thousands of feature flags — making manual design system audits impossible. Design System Observability adds a deterministic component scanner that flags non-Base elements, plus a daily screenshot pipeline that auto-files Jira tickets for violations. Teams using Base report 3x faster development and 50% less code.
Read more →Letting executive assistants book rides for executives meant rethinking trip ownership, identity, billing, and notifications across 30+ backend services and 5 client platforms. Uber introduced a "participant model" extending every booking, tracking, and billing touchpoint to support multiple user profiles per trip — with full audit trails for both EA and executive.
Read more →Live Activities run sandboxed with no network access — yet Uber needed real-time driver location on the lock screen. App Groups share on-disk state between the main app and Live Activity, a lightweight DSL syncs content logic across iOS Live Activities and Android push, and an OOA backend debounces updates. Result: 2.26% fewer driver and 2.13% fewer rider cancellations at pickup.
Read more →Product review processes surfaced critical gaps too late, forcing teams into costly rework on unsupported assumptions and blind dependencies. Uber built an AI-powered PRD Evaluator that assembles contextual knowledge across linked documents, prior experiments, and company-specific frameworks to audit proposals before high-level review. Early adoption by dozens of PMs revealed stronger artifact quality entering checkpoints and sharper, faster review discussions.
Read more →Cross-functional alignment used to take weeks of meetings and PRDs. AI prototyping tools — Lovable, Figma Make, Claude Code, Cursor — compressed a merchant team's four-week discussion into two hours, and let a PM explore six concepts in 20 minutes. Nearly 40% of Uber's global hackathon submissions now incorporate these tools.
Read more →Training LLMs for Eats, support, and code gen at Uber means squeezing every GPU cycle. The stack: PyTorch, Ray, DeepSpeed ZeRO-3 CPU Offload (34% memory reduction, 2-7x batch sizes), and Flash Attention (50% memory savings). On H100, Mixtral 8x7b achieves 3x A100 throughput, scaling linearly to batch 64.
Read more →Uber's restaurant retrieval ran thousands of city-specific Spark jobs weekly — a model that couldn't scale globally. Two-Tower Embeddings collapse it into one global model: query and item towers, Bag-of-Words history shrinking the model 20x, and LogQ correction pushing recall@500 from 89% to 93%. Now serves hundreds of millions of eaters at ~100ms latency.
Read more →As AI proliferates across Uber, knowing which model does what — and who owns it when something breaks — becomes mission-critical. Uber built a five-pillar program: a Model Catalog with standardized Model Cards, SHAP/PFI/integrated gradients in Michelangelo, compliance checks embedded in design workflows from day one, plus structured Education and Adoption.
Read more →Uber's ML teams trained on tens of TBs with GPUs idle 85–90% of the time, waiting on serialized data loading. Two Petastorm fixes — pushing PyArrow-to-NumPy conversions into a parallel worker pool, plus FanoutCache for local disk — pushed GPU utilization from 10–15% to 60%+, training time from 22h to 3h (7.3x), and cut compute costs ~80%.
Read more →A misplaced flag in a privileged `rm -r` could silently delete a production dataset with no audit trail. Superuser Gateway removes superuser credentials from engineers' machines entirely, routing privileged commands through a Git-backed PR workflow with CLI submission, CI validation, peer approval, and controlled remote execution. Now standard for all data platform admins.
Read more →An accidental IAM change on a critical gateway once stopped Uber Eats customers from modifying orders — and ~10% of monthly policy changes involve risky privilege removal. The Policy Simulator pulls 30–90 days of access logs from Apache Pinot and replays them through current vs. proposed policies. Cadence-orchestrated, sub-minute impact analysis before anything ships.
Read more →Rotating 100,000+ Kerberos keytabs is risky: rotation invalidates the previous key immediately, leaving applications without valid credentials. Uber's solution generates keytabs with both old and new versions during transitions, drops fetch intervals to 30s, and integrates with the Secret Management Platform. Now rotates 30,000+ keytabs monthly with zero disruptions.
Read more →150,000 secrets across 25 fragmented vaults — no centralized detection, rotation, or attribution. Uber consolidated into 6 managed vaults, deployed real-time scanning across git/Slack/CI, and built a Cadence-orchestrated Secret Lifecycle Manager. A team of 10 now drives 20,000 automated monthly rotations with 90% fewer secrets exposed in pipelines.
Read more →Migrating 160+ PB from HDFS to GCS meant bridging Kerberos delegation tokens and GCP OAuth 2.0 — without changing any of thousands of analytical jobs. The Storage Access Service intercepts FileSystem calls, exchanges Hadoop tokens for time-bound GCP credentials, and caches across three layers. Handles 500,000+ RPS at 0.026ms average latency.
Read more →Manually tagging sensitive data columns at exabyte scale was impractical — yet classification is the foundation of privacy controls and encryption. DataK9 uses a hybrid approach: experts manually classify under 1% of datasets as golden examples, then rule-based Bloom filters and ML-trained Linear SVMs auto-tag the remaining 400,000+. Required to exceed 90% accuracy and 85% F2-score before activation.
Read more →The latest from Uber Engineering and beyond.
Type your list or snap a photo — Cart Assistant builds your grocery basket in seconds using agentic AI, prioritizing your past favorites and factoring in real-time availability and promotions.
Read more →Introducing AV Labs — a new team harnessing millions of real Uber trips to build the data flywheel autonomy needs. Mining long-tail driving scenarios at a scale few others can match.
Read more →Uber's open-source workflow orchestration engine — powering 150+ companies — joins the Cloud Native Computing Foundation, reinforcing its commitment to open governance and durable orchestration at scale.
Read more →Browse research and articles authored by our team members.
Mobile applications in large-scale distributed systems are susceptible to backend service failures, yet traditional chaos engineering approaches cannot scale mobile testing due to the complexity of end-to-end test environments.
Continue reading → SOSP 2025The deployment of large-scale data analytics between on-premise and cloud sites requires careful partitioning of both data and computation to avoid massive networking costs and performance degradation.
Continue reading → ICSE 2025Maintaining a "green" mainline branch — where all builds pass successfully — is crucial but challenging in fast-paced, large-scale software development environments, particularly with concurrent code changes in large monorepos.
Continue reading →Infrastructure for connecting the digital and physical worlds -- open to everyone.
Blazing fast, structured, leveled logging in Go.
The Uber Go Style Guide — patterns and conventions for writing clean, idiomatic Go.
React Component library implementing the Base design language.
Data Visualization Components for React.
Cross-platform mobile architecture framework powering the Uber rider and driver apps.
Dependency injection based application framework for Go.
P2P Docker registry capable of distributing TBs of data in seconds.
Hexagonal hierarchical geospatial indexing system.
Uplift modeling and causal inference with machine learning algorithms.
Goroutine leak detector for Go.
Automatically set GOMAXPROCS to match Linux container CPU quota.
Blocking leaky-bucket rate limit implementation.
Reflection based dependency injection toolkit for Go.
Eliminate NullPointerExceptions with low build-time overhead.
Where the digital-physical bridge is built, one domain at a time.
ETA prediction, marketplace optimization, fraud detection, autonomous vehicles. ML models that are tested against the real world every second.
Explore AI/ML →4,000+ microservices, millions of RPS, five 9s reliability. The backbone that keeps 41M daily trips running without a hitch.
Explore Backend →Trillions of events daily. Real-time streaming, data lakes, analytics infrastructure that turns raw signals into decisions at global scale.
Explore Data →The digital window to the physical world. The tap that becomes a ride. Cross-platform apps used by 200M+ people worldwide.
Explore Mobile →Identity, encryption, secrets, and data classification at exabyte scale. Protecting 200M+ users across multiple clouds.
Explore Security →Browser-based experiences for riders, eaters, drivers, and merchants. Performance, accessibility, and rendering at global scale.
Explore Web →Join engineers across 4 continents and help bridge code and the physical world.
Where would your code empower real humans?
View Open RolesLead breakthrough ML innovation in Driver Pricing — architect next-gen ML systems that optimize marketplace efficiency and driver earnings for millions globally.
Apply now → Applied AILead foundation model strategy across Uber's search, recommendations, and conversational AI surfaces — shape architecture decisions and mentor senior engineers at global scale.
Apply now → Rider MLBuild ML solutions for product recommendations and merchandising — deploy sequential recommendation systems that help riders find and complete rides, partnering with product, data science, and business stakeholders.
Apply now →