Tag: Data Architecture
We implemented a Kappa architecture at Uber to effectively backfill streaming data at scale, ensuring accurate data in our platform.
Performing updates of individual records in Uber's over 100 petabyte Apache Hadoop data lake required building Global Index, a component that manages data bookkeeping and lookups at scale.
Written in Haskell, Queryparser is Uber Engineering's open source tool for parsing and analyzing SQL queries that makes it easy to identify foreign-key relationships in large data warehouses.
Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale.
Uber Engineering's data processing platform team recently built and open sourced Hudi, an incremental processing framework that supports our business critical data pipelines. In this article, we see how Hudi powers a rich data ecosystem where external sources can be ingested into Hadoop in near real-time.