Tag: Big Data
Performing updates of individual records in Uber's over 100 petabyte Apache Hadoop data lake required building Global Index, a component that manages data bookkeeping and lookups at scale.
We submitted Hudi to the Apache Incubator to ensure the long-term growth and sustainability of the project under The Apache Software Foundation.
Uber's Maps Collection and Reporting (MapCARs) team shares best practices when choosing which HDFS file formats are optimal for use with Apache Spark.
How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.
Uber engineers offer two common use cases showing how we orchestrate machine learning model training in our data workflow engine.
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
Uber engineers discuss the development of DBEvents, a change data capture system designed for high data quality and freshness that is capable of operating on a global scale.
Our editors spotlight some of the year's most popular articles, from an overview of our Big Data platform to a first-person account of an engineer's immigrant journey.
Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.
Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.
Today we introduce Marmaray, an open source framework allowing data ingestion and dispersal for Apache Hadoop, realizing our vision of any-sync-to-any-source functionality, including data format validation.
Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.
Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale.
In this article, we discuss deck.gl, an open sourced, WebGL-powered framework specifically designed for exploring and visualizing data sets at scale.
The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.
Here we look at Hadoop data ingestion, and how Uber Engineering streams diverse data into a cohesive layer for querying in near real-time using our in-house developed Streamific.