Tag: Apache Hadoop
Searchable Ground Truth: Querying Uncommon Scenarios in Self-Driving Car Development
When developing Uber's self driving car systems, engineers found a way to identify edge case scenarios amongst terabytes of sensor data representing real-world situations.
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
Uber engineers discuss the development of DBEvents, a change data capture system designed for high data quality and freshness that is capable of operating on a global scale.
Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads
Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.
Uber’s Big Data Platform: 100+ Petabytes with Minute Latency
Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.
Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache...
Today we introduce Marmaray, an open source framework allowing data ingestion and dispersal for Apache Hadoop, realizing our vision of any-sync-to-any-source functionality, including data format validation.
Scaling Uber’s Apache Hadoop Distributed File System for Growth
Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.