Skip to footer

Tag: Data Architecture

elephant

Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber

Performing updates of individual records in Uber's over 100 petabyte Apache Hadoop data lake required building Global Index, a component that manages data bookkeeping and lookups at scale.

Queryparser, an Open Source Tool for Parsing and Analyzing SQL

Written in Haskell, Queryparser is Uber Engineering's open source tool for parsing and analyzing SQL queries that makes it easy to identify foreign-key relationships in large data warehouses.

Meet Michelangelo: Uber’s Machine Learning Platform

Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale.

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Uber Engineering's data processing platform team recently built and open sourced Hudi, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hudi powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.

Popular Articles