Skip to footer

Tag: Data

elephant

Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber

Performing updates of individual records in Uber's over 100 petabyte Apache Hadoop data lake required building Global Index, a component that manages data bookkeeping and lookups at scale.
Latency graph

Optimizing M3: How Uber Halved Our Metrics Ingestion Latency by (Briefly) Forking the Go...

Noticing increased latency in our metrics platform, Uber engineers track down a bug related to stack growth in a goroutine, resulting in a fix elevated to the Go open source GitHub repository.
server racks

Solving Big Data Challenges with Data Science at Uber

How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.
Image of birds flying

Sessionizing Uber Trips in Real Time

Uber's many data flows required modeling the data associated with a specific task, such as a rider trip, into a state machine. The state machine lets engineers focus on just the events needed to successfully accomplish a trip.

Michelangelo PyML: Introducing Uber’s Platform for Rapid Python ML Model Development

Uber developed Michelangelo PyML to run identical copies of machine learning models locally in both real time experiments and large-scale offline prediction jobs.

Herb: Multi-DC Replication Engine for Uber’s Schemaless Datastore

Facing the need for a resilient data structure over thousands of storage nodes to serve the 15 million rides per day that occur on our platform, Uber engineers developed Herb, our data replication solution. Herb ensures data availability and integrity across our data centers.

Growing the Data Visualization Community with deck.gl v5

deck.gl v5 incorporates simplified APIs, scripting support, and framework agnosticism, making the popular open source data visualization software more accessible than ever before.

Introducing the Uber AI Residency

Interested in accelerating your career by tackling some of Uber’s most challenging AI problems? Apply for the Uber AI Residency, a research fellowship dedicated to fostering the next generation of AI talent.

Implementing Model-Agnosticism in Uber’s Real-Time Anomaly Detection Platform

Uber Engineering extended our anomaly detection platform's ability to integrate new forecast models, allowing this critical on-call service to scale to meet more complex use cases.

Gleaning Insights from Uber’s Partner Activity Matrix with Genomic Biclustering and Machine Learning

Uber Engineering's partner activity matrix leverages biclustering and machine learning to better understand the diversity of user experiences on our driver app.

Engineering More Reliable Transportation with Machine Learning and AI at Uber

In this article, we highlight how Uber leverages machine learning and artificial intelligence to tackle engineering challenges at scale.

Turbocharging Analytics at Uber with our Data Science Workbench

Uber Engineering's data science workbench (DSW) is an all-in-one toolbox that leverages aggregate data for interactive analytics and machine learning.

Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow

Uber Engineering introduces Horovod, an open source framework that makes it faster and easier to train deep learning models with TensorFlow.

Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform

Uber Engineering built AthenaX, our open source streaming analytics platform, to bring large-scale event stream processing to everyone.

Engineering Restaurant Manager, our UberEATS Analytics Dashboard

The UberEATS Restaurant Manager gives restaurant partners insight into their business by measuring customer satisfaction, sales, and service quality.

Meet Michelangelo: Uber’s Machine Learning Platform

Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale.

Uber’s Ride with the Sun: Tracking the 2017 Solar Eclipse

Uber Engineering’s Data Visualization team uses their deck.gl and Voyager visualization platforms to map rider behavior during the August 21, 2017 solar eclipse.

Engineering Uber’s Self-Driving Car Visualization Platform for the Web

Uber Engineering's Data Visualization Team and ATG built a new web-based platform that helps engineers and operators better understand information collected during testing of its self-driving vehicles.

Engineering Extreme Event Forecasting at Uber with Recurrent Neural Networks

Recurrent neural networks equip Uber Engineering's new forecasting model to more accurately predict rider demand during extreme events.

Detecting Abuse at Scale: Locality Sensitive Hashing at Uber Engineering

In this article, we discuss how Uber Engineering uses Locality Sensitive Hashing on Apache Spark to reliably detect fraudulent trips at scale.

Building an Intelligent Experimentation Platform with Uber Engineering

Composed of a staged rollout and intelligent analytics tool, Uber Engineering's experimentation platform is capable of stably deploying new features at scale across our apps. In this article, we discuss the challenges and opportunities we faced when building this product.

Redesigning Uber Engineering’s Mobile Content Delivery Ecosystem

How Uber Engineering re-architected the content delivery feed and backend ecosystem of our new driver app to deliver an enhanced user experience.

Presenting the Engineering Behind Uber at Our Technology Day

A daylong event at Uber’s Palo Alto office, sponsored by our LadyEng group, showcased the technical work across Uber Engineering as well as the people who are leading and building these projects. Here are some of the resulting presentations.

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Uber Engineering's data processing platform team recently built and open sourced Hudi, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hudi powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.

Why Uber Engineering Switched from Postgres to MySQL

Uber Engineering explains the technical reasoning behind its switch in database technologies, from Postgres to MySQL.

The Uber Engineering Tech Stack, Part II: The Edge and Beyond

The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.

The Uber Engineering Tech Stack, Part I: The Foundation

Uber’s mission is transportation as reliable as running water, everywhere, for everyone. Here's the first of a two-part series on the tech stack that Uber Engineering uses to make this happen.

Engineering Intelligence Through Data Visualization at Uber

The data visualization team in Uber Engineering delivers intelligence through crafting visual exploratory data analysis tools. Here's what some of these visualizations look like.

Streamific, the Ingestion Service for Hadoop Big Data at Uber Engineering

Here we look at Hadoop data ingestion, and how Uber Engineering streams diverse data into a cohesive layer for querying in near real-time using our in-house developed Streamific.

How Uber Thinks About Site Reliability Engineering

Uber’s mission is transportation as reliable as running water, for everyone, everywhere. This past month, Uber Engineering talked about what it takes to get site reliability engineering right.

How Uber Engineering Evaluated JSON Encoding and Compression Algorithms to Put the Squeeze on...

Imagine you have to store data whose massive influx increases by the hour. Your first priority, after making sure you can easily add storage capacity, is to try and reduce the data’s footprint to save space. But how? This is the story of Uber Engineering’s comprehensive encoding protocol and compression algorithm test and how this discipline saved space in our Schemaless datastores.

Using Triggers On Schemaless, Uber Engineering’s Datastore Using MySQL

The details and examples of Schemaless triggers, a key feature of the datastore that’s kept Uber Engineering scaling since October 2014. This is the third installment of a three-part series on Schemaless; the first part is a design overview and the second part is a discussion of architecture.

The Architecture of Schemaless, Uber Engineering’s Trip Datastore Using MySQL

How Uber’s infrastructure works with Schemaless, the datastore using MySQL that’s kept Uber Engineering scaling since October 2014. This is part two of a three-part series on Schemaless; part one is on designing Schemaless.

Designing Schemaless, Uber Engineering’s Scalable Datastore Using MySQL

The making of Schemaless, Uber Engineering’s custom designed datastore using MySQL, which has allowed us to scale from 2014 to beyond. This is part one of a three-part series on Schemaless.

Project Mezzanine: The Great Migration

What happens when you have to migrate hundreds of millions of rows of data and 100 services over several weeks with dozens of engineers, while simultaneously serving millions of rides? The story of how Uber moved to Mezzanine in 2014.

Popular Articles