5/24/2019

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle and Michael Carbin, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, ICLR 2019 (best paper).
We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
Methodology:
In Figure 1, we randomly sample and train subnetworks from a fully-connected network for MNIST and convolutional networks for CIFAR10. Random sampling models the effect of the unstructured pruning used by LeCun et al. (1990) and Han et al. (2015).
Iterative pruning:
repeatedly trains, prunes, and resets the network over n rounds; each round prunes p^1/n % of the weights that survive the previous round. Our results show that iterative pruning finds winning tickets that match the accuracy of the original network at smaller sizes than does one-shot pruning.
For fully-connected networks trained on MNIST:
after randomly initializing and training a network, we prune the network and reset the remaining connections to their original initializations. We use a simple layer-wise pruning heuristic: remove a percentage of the weights with the lowest magnitudes within each layer (as in Han et al. (2015)). Connections to outputs are pruned at half of the rate of the rest of the network. 
Follow-up papers:
Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, and Michael Carbin, The Lottery Ticket Hypothesis at Scale, arXiv:1903.01611, 5 Mar 2019.
Hattie Zhou, Janice Lan, Rosanne Liu, and Jason Yosinski, Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask, Uber, May 6, 2019 

沒有留言:

張貼留言