Skip to footer

Tag: Data Infrastructure

elephant

Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber

Performing updates of individual records in Uber's over 100 petabyte Apache Hadoop data lake required building Global Index, a component that manages data bookkeeping and lookups at scale.
server racks

Solving Big Data Challenges with Data Science at Uber

How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.
Elephant silhouette

DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake

Uber engineers discuss the development of DBEvents, a change data capture system designed for high data quality and freshness that is capable of operating on a global scale.

Queryparser, an Open Source Tool for Parsing and Analyzing SQL

Written in Haskell, Queryparser is Uber Engineering's open source tool for parsing and analyzing SQL queries that makes it easy to identify foreign-key relationships in large data warehouses.

Turbocharging Analytics at Uber with our Data Science Workbench

Uber Engineering's data science workbench (DSW) is an all-in-one toolbox that leverages aggregate data for interactive analytics and machine learning.

Engineering Data Analytics with Presto and Apache Parquet at Uber

Snap your fingers and presto! How Uber Engineering built a fast, efficient data analytics system with Presto and Parquet.

Popular Articles