Skip to footer

kafka - search results

Disaster Recovery for Multi-Region Kafka at Uber


Apache Kafka at Uber

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone

Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka


In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible.

Given the scope

Introducing Chaperone: How Uber Engineering Audits Apache Kafka End-to-End

As Uber continues to scale, our systems generate continually more events, interservice messages, and logs. Those data needs go through Kafka to get processed. How does our platform audit all these messages in real time?

To monitor our Kafka pipeline

uReplicator: Uber Engineering’s Robust Apache Kafka Replicator

Uber’s Analytics Pipeline

At Uber, we use Apache Kafka as a message bus for connecting different parts of the ecosystem. We collect system and application logs as well as event data from the rider and driver apps. Then we make

Uber’s Journey Toward Better Data Culture From First Principles

Data powers Uber

Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science

Fast and Reliable Schema-Agnostic Log Analytics Platform


At Uber, we provide a centralized, reliable, and interactive logging platform that empowers engineers to work quickly and confidently at scale. The logs are tagged with a rich set of contextual key value pairs, with which engineers can slice and

Uber’s Real-time Data Intelligence Platform At Scale: Improving Gairos Scalability/Reliability



Real-time data (# of ride requests, # of drivers available, weather, game) enables operations teams to make informed decisions like surge pricing, maximum dispatch ETA calculating, and demand/supply forecasting about our services that improve user experiences on the

No Code Workflow Orchestrator for Building Batch & Streaming Pipelines at Scale

Data-In-Motion @ Uber

At Uber, several petabytes of data move across and within various platforms every day. We power this data movement by a strong backbone of data pipelines. Whether it’s ingesting the data from millions of Uber trips or

Turning Metadata Into Insights with Databook

Every day in over 10,000 cities around the world, millions of people rely on Uber to travel, order food, and ship cargo. Our apps and services are available in over 69 countries and run 24 hours a day. At our

Operating Apache Pinot @ Uber Scale



Uber has a complex marketplace consisting of riders, drivers, eaters, restaurants and so on. Operating that marketplace at a global scale requires real-time intelligence and decision making. For instance, identifying delayed Uber Eats orders or abandoned carts helps to

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture


The performance of Uber’s services relies on our ability to quickly and stably launch new features on our platform, regardless of where the corresponding service lives in our tech stack. Foundational to our platform’s power is its microservice-based architecture

Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing


At Uber, we use robust data processing systems such as Apache Flink and Apache Spark to power the streaming applications that helps us calculate up-to-date pricing, enhance driver dispatching, and fight fraud on our platform. Such solutions can process

Engineering SQL Support on Apache Pinot at Uber

Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform.  

As Uber’s operations became more complex and we offered additional features and

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the

Making Apache Spark Effortless for All of Uber

Apache Spark is a foundational piece of Uber’s Big Data infrastructure that powers many critical aspects of our business. We currently run more than one hundred thousand Spark applications per day, across multiple different compute environments. Spark’s versatility, which allows

Uber Open Source: Catching Up with Felix Cheung, Data Platform Engineering Manager

It may seem counter-intuitive that open source software, with its public availability and transparent code, can give private enterprise an edge. But that’s the lesson provided by Felix Cheung, an engineering manager at Uber who puts significant effort into

Solving Big Data Challenges with Data Science at Uber


The data involved in serving millions of rides and food deliveries on Uber’s platform doesn’t just facilitate transactions, it also helps teams at Uber continually analyze and improve our services. When we launch new services, we can quickly measure success,

Accessible Machine Learning through Data Workflow Management


Machine learning (ML) pervades many aspect of Uber’s business. From responding to customer support tickets, optimizing queries, and forecasting demand, ML provides critical insights for many of our teams.

Our teams encountered many different challenges while incorporating

DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake


Keeping the Uber platform reliable and real-time across our global markets is a 24/7 business. People may be going to sleep in San Francisco, but in Paris they’re getting ready for work, requesting rides from Uber driver-partners. At that same

Introducing AresDB: Uber’s GPU-Powered Open Source, Real-time Analytics Engine


At Uber, real-time analytics allow us to attain business insights and operational efficiency, enabling us to make data-driven decisions to improve experiences on the Uber platform. For example, our operations team relies on data to monitor the market health and