Tag: Big Data
In 2019, Uber's Data Platform team leveraged data science to improve the efficiency of our infrastructure, enabling us to compute optimum datastore and hardware usage.
We share technical challenges and lessons learned while productionizing and scaling XGBoost to train distributed gradient boosted algorithms at Uber.
Uber has embraced Presto, a high performance, distributed SQL query engine, and joined the Presto Foundation. Meet the Uber engineers who contribute to and use Presto on a daily basis.
When developing Uber's self driving car systems, engineers found a way to identify edge case scenarios amongst terabytes of sensor data representing real-world situations.
Uber is honored to join the Presto Foundation, a new initiative hosted by the Linux Foundation, to advance the open source data processing community.
Data science helps Uber determine which tables in a database should be off-boarded to another source to maximize the efficiency of our data warehouse.
Performing updates of individual records in Uber's over 100 petabyte Apache Hadoop data lake required building Global Index, a component that manages data bookkeeping and lookups at scale.
We submitted Hudi to the Apache Incubator to ensure the long-term growth and sustainability of the project under The Apache Software Foundation.
Uber's Maps Collection and Reporting (MapCARs) team shares best practices when choosing which HDFS file formats are optimal for use with Apache Spark.
How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.
Uber engineers offer two common use cases showing how we orchestrate machine learning model training in our data workflow engine.
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake
Uber engineers discuss the development of DBEvents, a change data capture system designed for high data quality and freshness that is capable of operating on a global scale.
Our editors spotlight some of the year's most popular articles, from an overview of our Big Data platform to a first-person account of an engineer's immigrant journey.
Uber developed Peloton to help us balance resource use, elastically share resources, and plan for future capacity needs.
Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.
Today we introduce Marmaray, an open source framework allowing data ingestion and dispersal for Apache Hadoop, realizing our vision of any-sync-to-any-source functionality, including data format validation.
Uber's Data Infrastructure team overhauled our approach to scaling our storage infrastructure by incorporating several new features and functionalities, including ViewFs, NameNode garbage collection tuning, and an HDFS load management service.
12Page 1 of 2