Skip to footer

kafka - search results

Presto® on Apache Kafka® At Uber Scale

Uber’s goal is to ignite opportunity by setting the world in motion, and big data is a very important part of that. Presto® and Apache Kafka® play critical roles in Uber’s big data stack. Presto is the de

Securing Kafka® Infrastructure at Uber

Background

Uber has one of the largest deployments of Apache Kafka® in the world. It empowers a large number of real-time workflows at Uber, including pub-sub message buses for passing event data from the rider and driver apps, as

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we

Enabling Seamless Kafka Async Queuing with Consumer Proxy

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone of our technology stack.

Disaster Recovery for Multi-Region Kafka at Uber

0

Apache Kafka at Uber

Uber has one of the largest deployments of Apache Kafka in the world, processing trillions of messages and multiple petabytes of data per day. As Figure 1 shows, today we position Apache Kafka as a cornerstone

Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka

0

In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible.

Given the scope

Introducing Chaperone: How Uber Engineering Audits Apache Kafka End-to-End

As Uber continues to scale, our systems generate continually more events, interservice messages, and logs. Those data needs go through Kafka to get processed. How does our platform audit all these messages in real time?

To monitor our Kafka pipeline

uReplicator: Uber Engineering’s Robust Apache Kafka Replicator

Uber’s Analytics Pipeline

At Uber, we use Apache Kafka as a message bus for connecting different parts of the ecosystem. We collect system and application logs as well as event data from the rider and driver apps. Then we make

Uber’s Emergency Button and The Technologies Behind It

Safety has long been a top priority at Uber, as Uber’s CEO Dara Khosrowshahi wrote in ‘Raising the Bar on Safety’ in September 2018. In order to #StandForSafety, the team at Uber has rolled out a set of

Avoiding CPU Throttling in a Containerized Environment

At Uber, all stateful workloads run on a common containerized platform across a large fleet of hosts. Stateful workloads include MySQL®, Apache Cassandra®, ElasticSearch®, Apache Kafka®, Apache HDFS, Redis

One Stone, Three Birds: Finer-Grained Encryption @ Apache Parquet™

Overview 

Data access restrictions, retention, and encryption at rest are fundamental security controls. This blog explains how we have built and utilized open-sourced Apache Parquet™’s finer-grained encryption feature to support all 3 controls in a unified way. In

Introducing Ballast: An Adaptive Load Test Framework

As Uber’s architecture has grown to encompass thousands of interdependent microservices, we need to test our mission-critical components at max load in order to preserve reliability. Accurate load testing allows us to validate if a set of services are working

Introducing Carbon Feed for Earners: The One-Stop Info Shop

After launching the Driver App in 2018 to over 2 million earners worldwide, we added content and functionality at a rapid pace. Although this really bolstered the platform, allowing for high-density and high-frequency content, and provided drivers and couriers with

Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop

Introduction

Uber is a worldwide marketplace of services, processing thousands of monetary transactions every second. As a marketplace, Uber takes on all of the risks associated with payment processing. Uber partners who use the marketplace to provide services are paid

How Uber Migrated Financial Data from DynamoDB to Docstore

Introduction

Each day, Uber moves millions of people around the world and delivers tens of millions of food and grocery orders. This generates a large number of financial transactions that need to be stored with provable completeness, consistency, and compliance.  

Introducing uGroup: Uber’s Consumer Management Framework

Background

Apache Kafka® is widely used across Uber’s multiple business lines. Take the example of an Uber ride: When a user opens up the Uber app, demand and supply data are aggregated in Kafka queues to serve fare calculations.

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Introduction

Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One

Building Scalable Streaming Pipelines for Near Real-Time Features

Background

Uber is committed to providing reliable services to customers across our global markets. To achieve this, we heavily rely on machine learning (ML) to make informed decisions like forecasting and surge. As a result, real-time streaming pipelines, which are

Efficiently Managing the Supply and Demand on Uber’s Big Data Platform

With Uber’s business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To reduce operational expenses, we developed a holistic framework with 3 pillars: platform efficiency, supply, and demand

Cost-Efficient Open Source Big Data Platform at Uber

As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When Big Data rose to become one of our largest operational expenses, we began an initiative to