Tag: Big Data
We submitted Hudi to the Apache Incubator to ensure the long-term growth and sustainability of the project under The Apache Software Foundation.
Uber's Maps Collection and Reporting (MapCARs) team shares best practices when choosing which HDFS file formats are optimal for use with Apache Spark.
How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.
Uber engineers offer two common use cases showing how we orchestrate machine learning model training in our data workflow engine.
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
Uber engineers discuss the development of DBEvents, a change data capture system designed for high data quality and freshness that is capable of operating on a global scale.
Our editors spotlight some of the year's most popular articles, from an overview of our Big Data platform to a first-person account of an engineer's immigrant journey.
Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.
Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.
Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.
Uber Engineering introduces Michelangelo, our machine learning-as-a-service system that enables teams to easily build, deploy, and operate ML solutions at scale.
In this article, we discuss deck.gl, an open sourced, WebGL-powered framework specifically designed for exploring and visualizing data sets at scale.
The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.
Here we look at Hadoop data ingestion, and how Uber Engineering streams diverse data into a cohesive layer for querying in near real-time using our in-house developed Streamific.