Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.
Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.
Snap your fingers and presto! How Uber Engineering built a fast, efficient data analytics system with Presto and Parquet.
How Uber Engineering re-architected the content delivery feed and backend ecosystem of our new driver app to deliver an enhanced user experience.
Uber Engineering's data processing platform team recently built and open sourced Hudi, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hudi powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.
In this article, we take a look at Euclid, Uber Engineering's Hadoop and Spark-based in-house marketing platform.
Take a look into uReplicator, Uber’s open source solution for replicating Apache Kafka data in a robust and reliable manner.
The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.
Here we look at Hadoop data ingestion, and how Uber Engineering streams diverse data into a cohesive layer for querying in near real-time using our in-house developed Streamific.