Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Data / ML, Engineering

Containerizing Apache Hadoop Infrastructure at Uber

July 22, 2021 / Global
Featured image for Containerizing Apache Hadoop Infrastructure at Uber
Figure 1: Team Responsibilities Shift
Figure 2: Cluster Management Architecture
Figure 3: Automatic Detection & Decommission of Bad HDFS DataNodes
Figure 4: YARN NodeManager and Application Sibling Containers
Figure 5: Kerberos Principal Registration & Keytab Distribution
Figure 6: UserGroups within Containers
Figure 7: Starlark file defining Configurations for different Cluster Types
Figure 8: Client Configuration Management
Figure 9: Migrating 200+ hosts within ~7 days
Mithun (Matt) Mathew

Mithun (Matt) Mathew

Mithun (Matt) Mathew is a Sr. Staff Engineer on the Data team at Uber. He currently works on various projects in the security domain. Previously, he led the initiative to containerize and automate Data infrastructure at Uber.

Qifan Shi

Qifan Shi

Qifan is a Senior Software Engineer with the Data Infrastructure team at Uber, and a core contributor for Hadoop containerization. He has been working on multiple systems that effectively orchestrates large-scale HDFS clusters.

Shuyi Zhang

Shuyi Zhang

Shuyi is a Senior Software Engineer with the Data Infrastructure team at Uber. She is the core contributor of Hadoop containerization. She is currently focusing on the Compute Resource Management system at Uber.

Posted by Mithun (Matt) Mathew, Qifan Shi, Shuyi Zhang, Jackie Murchison