Tag: Data Infrastructure
We implemented a Kappa architecture at Uber to effectively backfill streaming data at scale, ensuring accurate data in our platform.
Data science helps Uber determine which tables in a database should be off-boarded to another source to maximize the efficiency of our data warehouse.
Performing updates of individual records in Uber's over 100 petabyte Apache Hadoop data lake required building Global Index, a component that manages data bookkeeping and lookups at scale.
How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
Uber engineers discuss the development of DBEvents, a change data capture system designed for high data quality and freshness that is capable of operating on a global scale.
Written in Haskell, Queryparser is Uber Engineering's open source tool for parsing and analyzing SQL queries that makes it easy to identify foreign-key relationships in large data warehouses.
Uber Engineering's data science workbench (DSW) is an all-in-one toolbox that leverages aggregate data for interactive analytics and machine learning.
Snap your fingers and presto! How Uber Engineering built a fast, efficient data analytics system with Presto and Parquet.