Apache Kafka at Uber
Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone …
Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone …
In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible.
Given the scope …
As Uber continues to scale, our systems generate continually more events, interservice messages, and logs. Those data needs go through Kafka to get processed. How does our platform audit all these messages in real time?
To monitor our Kafka pipeline …
At Uber, we use Apache Kafka as a message bus for connecting different parts of the ecosystem. We collect system and application logs as well as event data from the rider and driver apps. Then we make …
Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science …
At Uber, we provide a centralized, reliable, and interactive logging platform that empowers engineers to work quickly and confidently at scale. The logs are tagged with a rich set of contextual key value pairs, with which engineers can slice and …
Real-time data (# of ride requests, # of drivers available, weather, game) enables operations teams to make informed decisions like surge pricing, maximum dispatch ETA calculating, and demand/supply forecasting about our services that improve user experiences on the …
At Uber, several petabytes of data move across and within various platforms every day. We power this data movement by a strong backbone of data pipelines. Whether it’s ingesting the data from millions of Uber trips or …
Every day in over 10,000 cities around the world, millions of people rely on Uber to travel, order food, and ship cargo. Our apps and services are available in over 69 countries and run 24 hours a day. At our …
Uber has a complex marketplace consisting of riders, drivers, eaters, restaurants and so on. Operating that marketplace at a global scale requires real-time intelligence and decision making. For instance, identifying delayed Uber Eats orders or abandoned carts helps to …
The performance of Uber’s services relies on our ability to quickly and stably launch new features on our platform, regardless of where the corresponding service lives in our tech stack. Foundational to our platform’s power is its microservice-based architecture…
At Uber, we use robust data processing systems such as Apache Flink and Apache Spark to power the streaming applications that helps us calculate up-to-date pricing, enhance driver dispatching, and fight fraud on our platform. Such solutions can process …
Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform.
As Uber’s operations became more complex and we offered additional features and …
Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the …
Apache Spark is a foundational piece of Uber’s Big Data infrastructure that powers many critical aspects of our business. We currently run more than one hundred thousand Spark applications per day, across multiple different compute environments. Spark’s versatility, which allows …
It may seem counter-intuitive that open source software, with its public availability and transparent code, can give private enterprise an edge. But that’s the lesson provided by Felix Cheung, an engineering manager at Uber who puts significant effort into …
The data involved in serving millions of rides and food deliveries on Uber’s platform doesn’t just facilitate transactions, it also helps teams at Uber continually analyze and improve our services. When we launch new services, we can quickly measure success, …
Machine learning (ML) pervades many aspect of Uber’s business. From responding to customer support tickets, optimizing queries, and forecasting demand, ML provides critical insights for many of our teams.
Our teams encountered many different challenges while incorporating …
Keeping the Uber platform reliable and real-time across our global markets is a 24/7 business. People may be going to sleep in San Francisco, but in Paris they’re getting ready for work, requesting rides from Uber driver-partners. At that same …
At Uber, real-time analytics allow us to attain business insights and operational efficiency, enabling us to make data-driven decisions to improve experiences on the Uber platform. For example, our operations team relies on data to monitor the market health and …