Skip to footer

Tag: Apache Spark

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

We share technical challenges and lessons learned while productionizing and scaling XGBoost to train distributed gradient boosted algorithms at Uber.

Evolving Michelangelo Model Representation for Flexibility at Scale

To accommodate additional ML use cases, Uber evolved Michelangelo's application of the Apache Spark MLlib library for greater flexibility and extensibility.
Pedestrian density map

Searchable Ground Truth: Querying Uncommon Scenarios in Self-Driving Car Development

When developing Uber's self driving car systems, engineers found a way to identify edge case scenarios amongst terabytes of sensor data representing real-world situations.
Chinese Water Dragon photo by InspiredImages/Pixabay

Making Apache Spark Effortless for All of Uber

Uber engineers created uSCS, a Spark-as-a-Service solution that helps manage Apache Spark jobs throughout large organizations.

Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark...

Uber's Maps Collection and Reporting (MapCARs) team shares best practices when choosing which HDFS file formats are optimal for use with Apache Spark.

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.

Popular Articles