Skip to footer

YAML Generator for Funnel YAML Files: Streamlining the Mobile Data Workflow Process

At Uber, real-time mobile analytics events—generated by button taps, page views, and more—form the backbone of the mobile data workflow process. To process these events, our Mobile Data Platform Team designed and developed the Fontana library, which converts the nearly-one-million-QPS (queries per second) volume of events into easily digestible and useful analytics for Uber engineers. As part of this process,...

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Problem Uber deploys a few storage technologies to store business data based on their application model. One such technology is called Schemaless, which enables the modeling of related entries in one single row of multiple columns, as well as versioning per column. Schemaless has been around for a couple of years, amassing Uber’s data. While Uber is consolidating all the use...

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Introduction Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One of the areas of expertise within GSS is the Digitization vertical. The Digitization team efficiently converts physical signals into digital...

Enabling Seamless Kafka Async Queuing with Consumer Proxy

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack. It empowers a large number of different workflows, including pub-sub message buses for passing event data from the rider...

How Data Shapes the Uber Rider App

Introduction Data is crucial for our products. Data analytics help us provide a frictionless experience to the people that use our services. It also enables our engineers, product managers, data analysts, and data scientists to make informed decisions. The impact of data analysis can be seen in every screen of our app: what is displayed on the home screen, the...

Building Scalable Streaming Pipelines for Near Real-Time Features

Background Uber is committed to providing reliable services to customers across our global markets. To achieve this, we heavily rely on machine learning (ML) to make informed decisions like forecasting and surge. As a result, real-time streaming pipelines, which are used to generate the data and features for ML, have become more popular and important. At Uber, we leverage Apache Flink...

Eats Safety Team On-Call Overview

Introduction Our engineers have the responsibility of ensuring a consistent and positive experience for our riders, drivers, eaters, and delivery/restaurant partners. Ensuring such an experience requires reliable systems: our apps have to work when anyone needs them. A major component of reliability is having engineers on call to deal with problems immediately as they arise. We set up our on-call engineers...

Unifying Support Content to Enable More Empathetic and Personalized Customer Support Experiences

Introduction  Content quality is critical to the support experienced by Uber’s customers. Consider an Eater who reached out for help to cancel a very delayed order. The same resolution, such as refunding the charge, can be delivered alongside a robotic-sounding message, or one where the style and tone of the response conveys true empathy and acknowledges the user’s poor experience...

Efficiently Managing the Supply and Demand on Uber’s Big Data Platform

With Uber’s business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To reduce operational expenses, we developed a holistic framework with 3 pillars: platform efficiency, supply, and demand (using supply to describe the hardware resources that are made available to run big data storage and compute workload,...

Cost-Efficient Open Source Big Data Platform at Uber

As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When Big Data rose to become one of our largest operational expenses, we began an initiative to reduce costs on our data platform, which divides challenges into 3 broad pillars: platform efficiency, supply, and demand. In...

Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data

Introduction Big data is at the core of Uber’s business. We continue to innovate and provide better experiences for our earners, riders, and eaters by leveraging big data, machine learning, and artificial intelligence technology. As a result, over the last four years, the scale of our big data platform multiplied from single-digit petabytes to many hundreds of petabytes. Uber’s big data...

How Uber Achieves Operational Excellence in the Data Quality Experience

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly impact business operations and decisions. Without data quality guarantees, downstream service computation or machine learning model performance quickly degrade,...

Uber’s Finance Computation Platform

For a company of our size and scale, robust, accurate, and compliant accounting and analytics are a necessity, ensuring accurate and granular visibility into our financials, across multiple lines of business. Most standard, off-the-shelf finance engineering solutions cannot support the scale and scope of the transactions on our ever-growing platform.  The ride-sharing business alone has over 4 billion trips per...

Pinot Real-Time Ingestion with Cloud Segment Storage

Introduction Apache Pinot is an open source data analytics engine (OLAP), which allows users to query data ingested from as recently as a few seconds ago to as old as a few years back. Pinot’s ability to ingest real-time data and make them available for low-latency queries is the key reason why it has become an important component of Uber’s...

Uber’s Fulfillment Platform: Ground-up Re-architecture to Accelerate Uber’s Go/Get Strategy

Introduction to Fulfillment at Uber Uber’s mission is to help our consumers effortlessly go anywhere and get anything in thousands of cities worldwide. At its core, we capture a consumer’s intent and fulfill it by matching it with the right set of providers.  Fulfillment is the “act or process of delivering a product or service to a customer.” The Fulfillment organization...

Containerizing Apache Hadoop Infrastructure at Uber

Introduction As Uber’s business grew, we scaled our Apache Hadoop (referred to as ‘Hadoop’ in this article) deployment to 21000+ hosts in 5 years, to support the various analytical and machine learning use cases. We built a team with varied expertise to address the challenges we faced running Hadoop on bare-metal: host lifecycle management, deployment and automation, Hadoop core development,...

‘Orders Near You’ and User-Facing Analytics on Real-Time Geospatial Data

Introduction By its nature, Uber’s business is highly real-time and contingent upon geospatial data. PBs of data are continuously being collected from our drivers, riders, restaurants, and eaters. Real-time analytics over this geospatial data could provide powerful insights. In this blog, we will highlight the Orders near you feature from the Uber Eats app, illustrating one example of how Uber generates...

Analyzing Customer Issues to Improve User Experience

Introduction The primary goal for customer support is to ensure users’ issues are addressed and resolved in a timely and effective manner. The kind of issues users face and what they say in their support interactions provides a lot of information about the product experience, any technical or operational gaps and even their general sentiment towards the product / company....

Customer Support Automation Platform at Uber

High Level Overview of the Problem Introduction If you’ve used any online/digital service, chances are that you are familiar with what a typical customer service experience entails: you send a message (usually email aliased) to the company’s support staff, fill out a form, expect some back and forth with a customer service representative (CSR), and hopefully have your issue resolved. This...

Tuning Model Performance

Introduction Uber uses machine learning (ML) models to power critical business decisions. An ML model goes through many experiment iterations before making it to production. During the experimentation phase, data scientists or machine learning engineers explore adding features, tuning parameters, and running offline analysis or backtesting. We enhanced the platform to reduce the human toil and time in this stage,...

Popular Articles