Skip to footer
Home Authors Posts by Zheng Shao

Zheng Shao

Zheng Shao is a Distinguished Engineer at Uber. His focus is on big data cost efficiency as well as data infra multi-region and on-prem/cloud architecture. He is also an Apache Hadoop PMC member and an Emeritus Apache Hive PMC member.

Engineering Blog Articles

Cost Efficiency @ Scale in Big Data File Format



Our Apache Hadoop® based data platform ingests hundreds of petabytes of analytical data with minimum latency and stores it in a data lake built on top of the Hadoop Distributed File System (HDFS). We use Apache Hudi

Efficiently Managing the Supply and Demand on Uber’s Big Data Platform

With Uber’s business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To reduce operational expenses, we developed a holistic framework with 3 pillars: platform efficiency, supply, and demand

Cost-Efficient Open Source Big Data Platform at Uber

As Uber’s business has expanded, the underlying pool of data that powers it has grown exponentially, and thus ever more expensive to process. When Big Data rose to become one of our largest operational expenses, we began an initiative to

Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data


Big data is at the core of Uber’s business. We continue to innovate and provide better experiences for our earners, riders, and eaters by leveraging big data, machine learning, and artificial intelligence technology. As a result, over the last