Skip to footer
Home Authors Posts by Xinli Shang

Xinli Shang

3 BLOG ARTICLES 0 RESEARCH PAPERS
Xinli Shang is a Tech Lead Manager on the Uber Big Data Infra team, VP of Apache Parquet PMC Chair, and Uber Open Source Committee member. He is leading the Apache Parquet community and contributing to several other communities like Presto and Trino. He is also leading several initiatives on data format for storage efficiency, security, and performance. He is also passionate about tuning large-scale services for performance, throughput, and reliability.

Engineering Blog Articles

One Stone, Three Birds: Finer-Grained Encryption @ Apache Parquet™

Overview 

Data access restrictions, retention, and encryption at rest are fundamental security controls. This blog explains how we have built and utilized open-sourced Apache Parquet™’s finer-grained encryption feature to support all 3 controls in a unified way. In

Cost Efficiency @ Scale in Big Data File Format

 

Background

Our Apache Hadoop® based data platform ingests hundreds of petabytes of analytical data with minimum latency and stores it in a data lake built on top of the Hadoop Distributed File System (HDFS). We use Apache Hudi

Tricks of the Trade: Tuning JVM Memory for Large-scale Services

0

Running queries on Uber’s data platform lets us make data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks in the driver sign-up process. Our Apache Hadoop-based data platform ingests