Tag: Apache Spark
We implemented a Kappa architecture at Uber to effectively backfill streaming data at scale, ensuring accurate data in our platform.
We share technical challenges and lessons learned while productionizing and scaling XGBoost to train distributed gradient boosted algorithms at Uber.
To accommodate additional ML use cases, Uber evolved Michelangelo's application of the Apache Spark MLlib library for greater flexibility and extensibility.
When developing Uber's self driving car systems, engineers found a way to identify edge case scenarios amongst terabytes of sensor data representing real-world situations.
Uber's Maps Collection and Reporting (MapCARs) team shares best practices when choosing which HDFS file formats are optimal for use with Apache Spark.
Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.
Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.