Uber’s goal is to ignite opportunity by setting the world in motion, and big data is a very important part of that. Presto® and Apache Kafka® play critical roles in Uber’s big data stack. Presto is the de …
kafka - search results
Securing Kafka® Infrastructure at Uber
Background
Uber has one of the largest deployments of Apache Kafka® in the world. It empowers a large number of real-time workflows at Uber, including pub-sub message buses for passing event data from the rider and driver apps, as …
Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot
Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we …
Enabling Seamless Kafka Async Queuing with Consumer Proxy
Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack. …
Disaster Recovery for Multi-Region Kafka at Uber
Apache Kafka at Uber
Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone …
Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka
In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible.
Given the scope …
Introducing Chaperone: How Uber Engineering Audits Apache Kafka End-to-End
As Uber continues to scale, our systems generate continually more events, interservice messages, and logs. Those data needs go through Kafka to get processed. How does our platform audit all these messages in real time?
To monitor our Kafka pipeline …
uReplicator: Uber Engineering’s Robust Apache Kafka Replicator
Uber’s Analytics Pipeline
At Uber, we use Apache Kafka as a message bus for connecting different parts of the ecosystem. We collect system and application logs as well as event data from the rider and driver apps. Then we make …
Uber’s Emergency Button and The Technologies Behind It
Safety has long been a top priority at Uber, as Uber’s CEO Dara Khosrowshahi wrote in ‘Raising the Bar on Safety’ in September 2018. In order to #StandForSafety, the team at Uber has rolled out a set of …
Avoiding CPU Throttling in a Containerized Environment
At Uber, all stateful workloads run on a common containerized platform across a large fleet of hosts. Stateful workloads include MySQL®, Apache Cassandra®, ElasticSearch®, Apache Kafka®, Apache HDFS™, Redis™…
One Stone, Three Birds: Finer-Grained Encryption @ Apache Parquet™
Overview
Data access restrictions, retention, and encryption at rest are fundamental security controls. This blog explains how we have built and utilized open-sourced Apache Parquet™’s finer-grained encryption feature to support all 3 controls in a unified way. In …
Introducing Ballast: An Adaptive Load Test Framework
As Uber’s architecture has grown to encompass thousands of interdependent microservices, we need to test our mission-critical components at max load in order to preserve reliability. Accurate load testing allows us to validate if a set of services are working …
Introducing Carbon Feed for Earners: The One-Stop Info Shop
After launching the Driver App in 2018 to over 2 million earners worldwide, we added content and functionality at a rapid pace. Although this really bolstered the platform, allowing for high-density and high-frequency content, and provided drivers and couriers with …
Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop
Introduction
Uber is a worldwide marketplace of services, processing thousands of monetary transactions every second. As a marketplace, Uber takes on all of the risks associated with payment processing. Uber partners who use the marketplace to provide services are paid …
How Uber Migrated Financial Data from DynamoDB to Docstore
Introduction
Each day, Uber moves millions of people around the world and delivers tens of millions of food and grocery orders. This generates a large number of financial transactions that need to be stored with provable completeness, consistency, and compliance. …
Introducing uGroup: Uber’s Consumer Management Framework
Background
Apache Kafka® is widely used across Uber’s multiple business lines. Take the example of an Uber ride: When a user opens up the Uber app, demand and supply data are aggregated in Kafka queues to serve fare calculations. …
Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework
Introduction
Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One …
Building Scalable Streaming Pipelines for Near Real-Time Features
Background
Uber is committed to providing reliable services to customers across our global markets. To achieve this, we heavily rely on machine learning (ML) to make informed decisions like forecasting and surge. As a result, real-time streaming pipelines, which are …
Efficiently Managing the Supply and Demand on Uber’s Big Data Platform
With Uber’s business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To reduce operational expenses, we developed a holistic framework with 3 pillars: platform efficiency, supply, and demand …
Cost-Efficient Open Source Big Data Platform at Uber
As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When Big Data rose to become one of our largest operational expenses, we began an initiative to …