Tag: Apache Spark
Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber
We share technical challenges and lessons learned while productionizing and scaling XGBoost to train distributed gradient boosted algorithms at Uber.
Evolving Michelangelo Model Representation for Flexibility at Scale
To accommodate additional ML use cases, Uber evolved Michelangelo's application of the Apache Spark MLlib library for greater flexibility and extensibility.
Searchable Ground Truth: Querying Uncommon Scenarios in Self-Driving Car Development
When developing Uber's self driving car systems, engineers found a way to identify edge case scenarios amongst terabytes of sensor data representing real-world situations.
Making Apache Spark Effortless for All of Uber
Uber engineers created uSCS, a Spark-as-a-Service solution that helps manage Apache Spark jobs throughout large organizations.
Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark...
Uber's Maps Collection and Reporting (MapCARs) team shares best practices when choosing which HDFS file formats are optimal for use with Apache Spark.
Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads
Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.
Uber’s Big Data Platform: 100+ Petabytes with Minute Latency
Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.