Tag: Hoodie
Uber’s Big Data Platform: 100+ Petabytes with Minute Latency
Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.
Scaling Uber’s Apache Hadoop Distributed File System for Growth
Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.
Engineering Data Analytics with Presto and Apache Parquet at Uber
Snap your fingers and presto! How Uber Engineering built a fast, efficient data analytics system with Presto and Parquet.
Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop
Uber Engineering's data processing platform team recently built and open sourced Hudi, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hudi powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.