Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Engineering, AI

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

May 6, 2019 / Global
Featured image for Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
Figure 1. Untrained networks perform at chance (10 percent accuracy, for example, on the MNIST dataset as depicted), if they are randomly initialized, or randomly initialized and randomly masked. However, applying the Lottery Ticket mask improves the network accuracy beyond the chance level.
Figure 2. When testing three CNNs on CIFAR-10, we find that the accuracy of networks with pruned weights frozen at their initial values degrades significantly more than those with pruned weights set to zero.
Figure 3: Selectively freezing weights to their initial value or zero depending on the direction they move during training produces better performance than freezing all weights at zero or init.
Figure 5. Different mask criteria can be thought of as segmenting the (wi, wf) space into regions corresponding to mask values of one vs. zero. The ellipse represents in cartoon form the area occupied by the positively correlated initial and final weights from a given layer. The mask shown corresponds to a “large final” criterion, which was used in the LT paper: weights with large final magnitude are kept, and weights with final values near zero are pruned. Note that this criterion ignores the initial magnitude of the weight.
Figure 6. The eight mask criteria considered in this study are shown, starting with the “large final” criterion that starred in the LT paper. Names we use to refer to the various methods are given along with the formula that projects each (wi, wf) pair to a score. Weights with the largest scores (colored regions) are kept, and weights with the smallest scores (gray regions) are pruned.
Figure 7. Measurements of the accuracy vs. pruning percentage for two networks, FC on MNIST (left) and Conv4 on CIFAR-10 (right), show that multiple mask criteria—large final, magnitude increase, and two others—reliably outperform the black random pruning baseline. In the Conv4 network, the performance boost of “magnitude increase” is larger than that of other mask criteria; asterisks mark where the difference between “large final” and “magnitude increase” is statistically significant at the p=0.05 level.
Figure 8: We show test accuracy vs. pruning percentage of two networks, FC (left) and Conv4 (right), while using different reinitialization methods. A clear distinction of performances between those that respect the consistency of signs and those that do not suggests that the specific initial values of kept weights do not matter as much as their signs.
Figure 9. The “large final, same sign” mask criterion produces the highest performing Supermasks in this study. In contrast to the “large final” mask in Figure 5, note this criterion masks out the quadrants where the sign of wi and wf differ.
Figure 10: We evaluate accuracy at initialization (with no training) of a single FC network on MNIST subject to the application of various masks. The x-axis depicts the percent of weights remaining in the network; all other weights are set to zero. The “large final same sign” mask creates the highest performing Supermask by a wide margin. Note that aside from the five independent runs performed to generate uncertainty bands, every data point on this plot is the same underlying network, just with different masks applied.
Hattie Zhou

Hattie Zhou

Hattie Zhou is a data scientist with Uber's Marketing Analytics team.

Janice Lan

Janice Lan

Janice Lan is a research scientist with Uber AI.

Rosanne Liu

Rosanne Liu

Rosanne is a senior research scientist and a founding member of Uber AI. She obtained her PhD in Computer Science at Northwestern University, where she used neural networks to help discover novel materials. She is currently working on the multiple fronts where machine learning and neural networks are mysterious. She attempts to write in her spare time.

Jason Yosinski

Jason Yosinski

Jason Yosinski is a former founding member of Uber AI Labs and formerly lead the Deep Collective research group.

Posted by Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski

Category: