Skip to footer

kafka - search results

Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka

The Uber Insurance Engineering team extended Kafka’s role in our existing event-driven architecture by using non-blocking request reprocessing and dead letter queues (DLQ) to achieve decoupled, observable error-handling without disrupting real-time traffic.

Introducing Chaperone: How Uber Engineering Audits Apache Kafka End-to-End

Uber Engineering explains why and how we built Chaperone, our in-house auditing system for monitoring Kafka pipeline health.
uReplicator: Uber Engineering’s Robust Apache Kafka Replicator

uReplicator: Uber Engineering’s Robust Apache Kafka Replicator

Take a look into uReplicator, Uber’s open source solution for replicating Apache Kafka data in a robust and reliable manner.
Apartment building

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture

Multi-tenancy lets Uber tag requests coming into our microservice architecture, giving us the flexibility to route requests to specific components, such as during testing scenarios.
elevated freeways

Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing

We implemented a Kappa architecture at Uber to effectively backfill streaming data at scale, ensuring accurate data in our platform.

Engineering SQL Support on Apache Pinot at Uber

We engineered full SQL support on Apache Pinot to enable quick analysis and reporting on aggregated data, leading to improved experiences on our platform.

Uber’s Data Platform in 2019: Transforming Information to Intelligence

In 2019, Uber's Data Platform team leveraged data science to improve the efficiency of our infrastructure, enabling us to compute optimum datastore and hardware usage.
Chinese Water Dragon photo by InspiredImages/Pixabay

Making Apache Spark Effortless for All of Uber

Uber engineers created uSCS, a Spark-as-a-Service solution that helps manage Apache Spark jobs throughout large organizations.
Seattle skyline

Uber Open Source: Catching Up with Felix Cheung, Data Platform Engineering Manager

Uber Engineering Manager and open source software community member Felix Cheung talks about his work with the Apache Software Foundation, open source at Uber, and XGBoost, a machine learning library for optimized distributed gradient boosting.
server racks

Solving Big Data Challenges with Data Science at Uber

How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.
Complex freeway interchange

Accessible Machine Learning through Data Workflow Management

Uber engineers offer two common use cases showing how we orchestrate machine learning model training in our data workflow engine.
Elephant silhouette

DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake

Uber engineers discuss the development of DBEvents, a change data capture system designed for high data quality and freshness that is capable of operating on a global scale.

Introducing AresDB: Uber’s GPU-Powered Open Source, Real-time Analytics Engine

AresDB, Uber's open source real-time analytics engine, leverages GPUs to enable real-time computation and data processing in parallel.
Image of birds flying

Sessionizing Uber Trips in Real Time

Uber's many data flows required modeling the data associated with a specific task, such as a rider trip, into a state machine. The state machine lets engineers focus on just the events needed to successfully accomplish a trip.

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.
Uber open source logo

Preview 7 Open Source Projects from the Uber Open Summit

Uber open source projects leads give updates on seven of our projects, all of which will be showcased at the upcoming Uber Open Summit 2018.

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.
Marmaray logo

Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache...

Today we introduce Marmaray, an open source framework allowing data ingestion and dispersal for Apache Hadoop, realizing our vision of any-sync-to-any-source functionality, including data format validation.
Summer 2018 Uber Eng Interns

Out of the Classroom: A Snapshot of Uber’s Summer 2018 Interns

A few of Uber's over 200 engineering interns from this year's summer program talk about the projects they worked on and what their experiences in the office were like.

Databook: Turning Big Data into Knowledge with Metadata at Uber

Databook, Uber's in-house platform for surfacing and exploring contextual metadata, makes dataset discovery and exploration easier for teams across the company.