Skip to footer

Uber Data

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.

Meet Michelangelo: Uber’s Machine Learning Platform

Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale.

Introducing AresDB: Uber’s GPU-Powered Open Source, Real-time Analytics Engine

AresDB, Uber's open source real-time analytics engine, leverages GPUs to enable real-time computation and data processing in parallel.

Forecasting at Uber: An Introduction

In this article, we provide a general overview of how our teams leverage forecasting to build better products and maintain the health of the Uber marketplace.

Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow

Uber Engineering introduces Horovod, an open source framework that makes it faster and easier to train deep learning models with TensorFlow.

Under the Hood of Uber’s Experimentation Platform

Uber's experimentation platform empowers us to improve the customer experience by allowing teams to launch, debug, measure, and monitor product changes.

Engineering Uncertainty Estimation in Neural Networks for Time Series Prediction at Uber

Uber Engineering introduces a new Bayesian neural network architecture that more accurately forecasts time series predictions and uncertainty estimations.

From Beautiful Maps to Actionable Insights: Introducing, Uber’s Open Source Geospatial Toolbox

Created by Uber's Visualization team, is an open source data agnostic, high-performance web-based application for large-scale geospatial visualizations.

Manifold: A Model-Agnostic Visual Debugging Tool for Machine Learning at Uber

Uber built Manifold, a model-agnostic visualization tool for ML performance diagnosis and model debugging, to facilitate a more informed and actionable model iteration process.

Welcoming the Era of Deep Neuroevolution

By leveraging neuroevolution to train deep neural networks, Uber AI Labs is developing solutions to solve reinforcement learning problems.
Marmaray logo

Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache...

Today we introduce Marmaray, an open source framework allowing data ingestion and dispersal for Apache Hadoop, realizing our vision of any-sync-to-any-source functionality, including data format validation.

Queryparser, an Open Source Tool for Parsing and Analyzing SQL

Written in Haskell, Queryparser is Uber Engineering's open source tool for parsing and analyzing SQL queries that makes it easy to identify foreign-key relationships in large data warehouses.

Engineering Extreme Event Forecasting at Uber with Recurrent Neural Networks

Recurrent neural networks equip Uber Engineering's new forecasting model to more accurately predict rider demand during extreme events.

M4 Forecasting Competition: Introducing a New Hybrid ES-RNN Model

With a solid margin, Uber senior data scientist Slawek Smyl won the M4 Competition with his hybrid Exponential Smoothing-Recurrent Neural Networks (ES-RNN) forecasting method.

Managing Uber’s Data Workflows at Scale

In this article, we discuss Uber's journey toward a unified, multi-tenant, and scalable data workflow management system.

Databook: Turning Big Data into Knowledge with Metadata at Uber

Databook, Uber's in-house platform for surfacing and exploring contextual metadata, makes dataset discovery and exploration easier for teams across the company.

Engineering Data Analytics with Presto and Apache Parquet at Uber

Snap your fingers and presto! How Uber Engineering built a fast, efficient data analytics system with Presto and Parquet.
Chinese Water Dragon photo by InspiredImages/Pixabay

Making Apache Spark Effortless for All of Uber

Uber engineers created uSCS, a Spark-as-a-Service solution that helps manage Apache Spark jobs throughout large organizations.

Using Causal Inference to Improve the Uber User Experience

Uber Labs leverages causal inference, a statistical method for better understanding the cause of experiment results, to improve our products and operations analysis.

Visualizing City Cores with H3, Uber’s Open Source Geospatial Indexing System

In a selection of presentations delivered at a June 2019 Uber meetup, we discuss how to use H3, our open source hexagonal indexing system, to facilitate the granular mining of large geospatial data sets.

Popular Articles