Skip to footer

Uber Data

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such...

YAML Generator for Funnel YAML Files: Streamlining the Mobile Data Workflow Process

At Uber, real-time mobile analytics events—generated by button taps, page views, and more—form the backbone of the mobile data workflow process. To process these events,...

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Introduction Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The...

Enabling Seamless Kafka Async Queuing with Consumer Proxy

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day....

How Data Shapes the Uber Rider App

Introduction Data is crucial for our products. Data analytics help us provide a frictionless experience to the people that use our services. It also enables...

Building Scalable Streaming Pipelines for Near Real-Time Features

Background Uber is committed to providing reliable services to customers across our global markets. To achieve this, we heavily rely on machine learning (ML) to...

Efficiently Managing the Supply and Demand on Uber’s Big Data Platform

With Uber’s business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To...

Cost-Efficient Open Source Big Data Platform at Uber

As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When...

Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data

Introduction Big data is at the core of Uber’s business. We continue to innovate and provide better experiences for our earners, riders, and eaters by...

How Uber Achieves Operational Excellence in the Data Quality Experience

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands...

Containerizing Apache Hadoop Infrastructure at Uber

Introduction As Uber’s business grew, we scaled our Apache Hadoop (referred to as ‘Hadoop’ in this article) deployment to 21000+ hosts in 5 years, to...

Analyzing Customer Issues to Improve User Experience

Introduction The primary goal for customer support is to ensure users’ issues are addressed and resolved in a timely and effective manner. The kind of...

Automating Merchant Live Monitoring with Real-Time Analytics: Charon

At Uber, live monitoring and automation of Ops is critical to preserve marketplace health, maintain reliability, and gain efficiency in markets. By the virtue...

Uber’s Journey Toward Better Data Culture From First Principles

Data powers Uber Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and...

Turning Metadata Into Insights with Databook

Every day in over 10,000 cities around the world, millions of people rely on Uber to travel, order food, and ship cargo. Our apps...

Operating Apache Pinot @ Uber Scale

Introduction Uber has a complex marketplace consisting of riders, drivers, eaters, restaurants and so on. Operating that marketplace at a global scale requires real-time intelligence...

Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine...

Uber ATG's self-driving vehicles measure a multitude of possible scenario variations to answer the age-old question: "how does the pedestrian cross the road?"
an image with 24 cats all purple except for one red

Monitoring Data Quality at Scale with Statistical Modeling

Uber employs statistical modeling to find anomalies in data and continually monitor data quality.

Building a Backtesting Service to Measure Model Performance at Uber-scale

We built a backtesting service to better assess financial forecast model error rates, facilitating improved forecast performance and decision making.

Women in Data Science at Uber: Moving the World With Data in 2020—and Beyond

In October 2019, Uber hosted our second annual Moving The World With Data meetup, showcasing some of our most interesting data science challenges in 2019.

Popular Articles