Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Engineering

Scaling Uber’s Apache Hadoop Distributed File System for Growth

April 5, 2018 / Global
Featured image for Scaling Uber’s Apache Hadoop Distributed File System for Growth
Figure 1. In 2016, our NameNode RPC queue time could exceed half a second per HDFS request.
Figure 2. We installed ViewFs in multiple data centers to help manage our HDFS namespaces.
Figure 3. By increasing the young generation size from 1.5GB to 16GB and tuning the ParGCCardsPerStrideChunk value, the total time our production NameNode spent on GC pause decreased from 13 percent to 1.7 percent.
Figure 4. Spotlight enables us to identify and disable accounts that are causing HDFS slowdown.
Figure 5. Uber Engineering’s current HDFS architecture incorporates high availability and Observer NameNodes.
Figure 6. Our near-future HDFS architecture will incorporate several additional features and functionalities that will contribute to the growth of our storage infrastructure.
Ang Zhang

Ang Zhang

Ang Zhang is an engineering manager on Uber's Data Foundation team.

Wei Yan

Wei Yan

Wei Yan is a Senior Software Engineer on the Marketplace Data Foundation team. She is a main contributor for the Data Quality Platform.

Posted by Ang Zhang, Wei Yan

Category: