Skip to footer

kafka - search results

Uber’s Journey Toward Better Data Culture From First Principles

Data powers Uber

Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science

Fast and Reliable Schema-Agnostic Log Analytics Platform


At Uber, we provide a centralized, reliable, and interactive logging platform that empowers engineers to work quickly and confidently at scale. The logs are tagged with a rich set of contextual key value pairs, with which engineers can slice and

Uber’s Real-time Data Intelligence Platform At Scale: Improving Gairos Scalability/Reliability



Real-time data (# of ride requests, # of drivers available, weather, game) enables operations teams to make informed decisions like surge pricing, maximum dispatch ETA calculating, and demand/supply forecasting about our services that improve user experiences on the

No Code Workflow Orchestrator for Building Batch & Streaming Pipelines at Scale

Data-In-Motion @ Uber

At Uber, several petabytes of data move across and within various platforms every day. We power this data movement by a strong backbone of data pipelines. Whether it’s ingesting the data from millions of Uber trips or

Turning Metadata Into Insights with Databook

Every day in over 10,000 cities around the world, millions of people rely on Uber to travel, order food, and ship cargo. Our apps and services are available in over 69 countries and run 24 hours a day. At our

Operating Apache Pinot @ Uber Scale



Uber has a complex marketplace consisting of riders, drivers, eaters, restaurants and so on. Operating that marketplace at a global scale requires real-time intelligence and decision making. For instance, identifying delayed Uber Eats orders or abandoned carts helps to

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture


The performance of Uber’s services relies on our ability to quickly and stably launch new features on our platform, regardless of where the corresponding service lives in our tech stack. Foundational to our platform’s power is its microservice-based architecture

Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing


At Uber, we use robust data processing systems such as Apache Flink and Apache Spark to power the streaming applications that helps us calculate up-to-date pricing, enhance driver dispatching, and fight fraud on our platform. Such solutions can process

Engineering SQL Support on Apache Pinot at Uber

Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform.  

As Uber’s operations became more complex and we offered additional features and

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the

Making Apache Spark Effortless for All of Uber

Apache Spark is a foundational piece of Uber’s Big Data infrastructure that powers many critical aspects of our business. We currently run more than one hundred thousand Spark applications per day, across multiple different compute environments. Spark’s versatility, which allows

Uber Open Source: Catching Up with Felix Cheung, Data Platform Engineering Manager

It may seem counter-intuitive that open source software, with its public availability and transparent code, can give private enterprise an edge. But that’s the lesson provided by Felix Cheung, an engineering manager at Uber who puts significant effort into

Solving Big Data Challenges with Data Science at Uber


The data involved in serving millions of rides and food deliveries on Uber’s platform doesn’t just facilitate transactions, it also helps teams at Uber continually analyze and improve our services. When we launch new services, we can quickly measure success,

Accessible Machine Learning through Data Workflow Management


Machine learning (ML) pervades many aspect of Uber’s business. From responding to customer support tickets, optimizing queries, and forecasting demand, ML provides critical insights for many of our teams.

Our teams encountered many different challenges while incorporating

DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake


Keeping the Uber platform reliable and real-time across our global markets is a 24/7 business. People may be going to sleep in San Francisco, but in Paris they’re getting ready for work, requesting rides from Uber driver-partners. At that same

Introducing AresDB: Uber’s GPU-Powered Open Source, Real-time Analytics Engine


At Uber, real-time analytics allow us to attain business insights and operational efficiency, enabling us to make data-driven decisions to improve experiences on the Uber platform. For example, our operations team relies on data to monitor the market health and

Sessionizing Uber Trips in Real Time


In one sense, Uber’s challenge of efficiently matching riders and drivers in the real world comes down to the question of how to collect, store, and logically arrange data. Our efforts to ensure low wait times by predicting rider demand,

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads


Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. At Uber, cluster management

Preview 7 Open Source Projects from the Uber Open Summit


Open source software pervades the work we do at Uber. On the infrastructure side, we have contributed projects like Jaeger, which lets engineers trace complex architectures, and M3, a metrics platform that works with Prometheus. For

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks