Sanyam Kapoor

Sanyam Kapoor

Research Papers

First-Order Preconditioning via Hypergradient Descent

T. Moskovitz, R. Wang, J. Lan, S. Kapoor, T. Miconi, J. Yosinski, A. Rawal
Standard gradient descent methods are susceptible to a range of issues that can impede training, such as high correlations and different scaling in parameter space.These difficulties can be addressed by second-order approaches that apply a pre-conditioning matrix to the gradient to improve convergence. Unfortunately, such algorithms typically struggle to scale to high-dimensional problems, in part because the calculation of specific preconditioners such as the inverse Hessian or Fisher information matrix is highly expensive. We introduce first-order preconditioning (FOP), a fast, scalable approach that generalizes previous work on hypergradient descent (Almeida et al., 1998; Maclaurin et al., 2015; Baydin et al.,2017) to learn a preconditioning matrix that only makes use of first-order information. [...] [PDF]
Conference on Neural Information Processing Systems (NeurlPS), 2019

