Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Data / ML, Engineering

Elastic Deep Learning with Horovod on Ray

March 8, 2021 / Global
Featured image for Elastic Deep Learning with Horovod on Ray
Figure 1: Validation accuracy as a function of epoch number (left) and corresponding number of workers in the experiment at the end of each epoch (right).
Figure 2: Validation accuracy as a function of epoch number (left) and relative wall clock time in seconds from the start of the experiment (right).
A representation of using Ray Tune with Horovod for nesting distributed training runs with parallel hyperparameter tuning.
A graphical representation of using dynamic resource allocation to improve model training performance under time constraints. An algorithm like HyperSched progressively reduces to zero exploration in favor of deeper exploitation of fewer trials by dynamically allocating more parallel resources.
Ludwig running in local mode (pre v0.4): all data needs to fit in memory on a single machine.
Ludwig running on a Ray cluster (post v0.4): Ray scales out preprocessing and distributed training to process large datasets without needing to write any infrastructure code in Ludwig.
Travis Addair

Travis Addair

Travis Addair is a software engineer at Uber AI leading the Deep Learning Training team as part of the Michelangelo AI platform. He leads the Horovod open source project and chairs its Technical Steering Committee within the Linux Foundation.

Xu Ning

Xu Ning

Xu Ning is a Senior Engineering Manager in Uber’s Seattle Engineering office, currently leading multiple development teams in Uber’s Michelangelo Machine Learning Platform. He previously led Uber's Cherami distributed task queue, Hadoop observability, and Data security teams.

Richard Liaw

Richard Liaw

Richard Liaw is a Software Engineer at Anyscale, where he currently leads development efforts for distributed Machine Learning libraries on Ray. He is one of the maintainers of the open source Ray project and is on leave from the PhD program at UC Berkeley.

Posted by Travis Addair, Xu Ning, Richard Liaw