Skip to footer

Tag: Big Data

Uber Submits Hudi, an Open Source Big Data Library, to The Apache Software Foundation

We submitted Hudi to the Apache Incubator to ensure the long-term growth and sustainability of the project under The Apache Software Foundation.

Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark...

Uber's Maps Collection and Reporting (MapCARs) team shares best practices when choosing which HDFS file formats are optimal for use with Apache Spark.
server racks

Solving Big Data Challenges with Data Science at Uber

How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.
Complex freeway interchange

Accessible Machine Learning through Data Workflow Management

Uber engineers offer two common use cases showing how we orchestrate machine learning model training in our data workflow engine.
Elephant silhouette

DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake

Uber engineers discuss the development of DBEvents, a change data capture system designed for high data quality and freshness that is capable of operating on a global scale.

Year in Review: 2018 Highlights from the Uber Engineering Blog

Our editors spotlight some of the year's most popular articles, from an overview of our Big Data platform to a first-person account of an engineer's immigrant journey.
Image of birds flying

Sessionizing Uber Trips in Real Time

Uber's many data flows required modeling the data associated with a specific task, such as a rider trip, into a state machine. The state machine lets engineers focus on just the events needed to successfully accomplish a trip.

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.

Scaling Uber’s Apache Hadoop Distributed File System for Growth

Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.

Meet Michelangelo: Uber’s Machine Learning Platform

Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale.

Visualize Data Sets on the Web with Uber Engineering’s deck.gl Framework

In this article, we discuss deck.gl, an open sourced, WebGL-powered framework specifically designed for exploring and visualizing data sets at scale.

The Uber Engineering Tech Stack, Part II: The Edge and Beyond

The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.

Streamific, the Ingestion Service for Hadoop Big Data at Uber Engineering

Here we look at Hadoop data ingestion, and how Uber Engineering streams diverse data into a cohesive layer for querying in near real-time using our in-house developed Streamific.

Popular Articles