Skip to footer

Open Source

The New Version of Orbit (v1.1) is Released: The Improvements, Design Changes, and Exciting...

Introduction The previous post gave an overview of Orbit, a Python package developed by Uber in order to perform Bayesian time-series analysis and forecasting. This...

How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning...

Introduction As part of Uber engineering’s wide efforts to reach profitability, recently our team was focused on reducing cost of compute capacity by improving efficiency....

Cadence Multi-Tenant Task Processing

Introduction Cadence is a multi-tenant orchestration framework that helps developers at Uber to write fault-tolerant, long-running applications, also known as workflows. It scales horizontally to...

CRISP: Critical Path Analysis for Microservice Architectures

Uber’s backend is an exemplar of microservice architecture. Each microservice is a small, individually deployable program performing a specific business logic (operation). The microservice...

How Uber Migrated Financial Data from DynamoDB to Docstore

Introduction Each day, Uber moves millions of people around the world and delivers tens of millions of food and grocery orders. This generates a large...

Introducing uGroup: Uber’s Consumer Management Framework

Background Apache Kafka® is widely used across Uber’s multiple business lines. Take the example of an Uber ride: When a user opens up the Uber app,...

Improving HDFS I/O Utilization for Efficiency

Scaling our data infrastructure with lower hardware costs while maintaining high performance and service reliability has been no easy feat. To accommodate the exponential...

Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner

  Introduction The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each...

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such...

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Introduction Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The...

Enabling Seamless Kafka Async Queuing with Consumer Proxy

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day....

Building Scalable Streaming Pipelines for Near Real-Time Features

Background Uber is committed to providing reliable services to customers across our global markets. To achieve this, we heavily rely on machine learning (ML) to...

Pinot Real-Time Ingestion with Cloud Segment Storage

Introduction Apache Pinot is an open source data analytics engine (OLAP), which allows users to query data ingested from as recently as a few seconds...

Containerizing Apache Hadoop Infrastructure at Uber

Introduction As Uber’s business grew, we scaled our Apache Hadoop (referred to as ‘Hadoop’ in this article) deployment to 21000+ hosts in 5 years, to...

‘Orders Near You’ and User-Facing Analytics on Real-Time Geospatial Data

Introduction By its nature, Uber’s business is highly real-time and contingent upon geospatial data. PBs of data are continuously being collected from our drivers, riders,...

Analyzing Customer Issues to Improve User Experience

Introduction The primary goal for customer support is to ensure users’ issues are addressed and resolved in a timely and effective manner. The kind of...

Customer Support Automation Platform at Uber

High Level Overview of the Problem Introduction If you’ve used any online/digital service, chances are that you are familiar with what a typical customer service experience...

Tuning Model Performance

Introduction Uber uses machine learning (ML) models to power critical business decisions. An ML model goes through many experiment iterations before making it to production....

Elastic Distributed Training with XGBoost on Ray

Introduction Since we productionized distributed XGBoost on Apache Spark™ at Uber in 2017, XGBoost has powered a wide spectrum of machine learning (ML) use cases...

Continuous Integration and Deployment for Machine Learning Online Serving and Models

Introduction At Uber, we have witnessed a significant increase in machine learning adoption across various organizations and use-cases over the last few years. Our machine...

Popular Articles