Adaptive and randomized (sub)gradient algorithms for online learning and optimization
John DuchiUniversity of California at Berkeley
Wednesday, February 27, 2013|
2:00pm - 3:00pm
Add to Google Calendar
About the Event
In this talk, I present two new families of subgradient methods for online learning and optimization. The first family, which we call Adagrad, dynamically incorporates knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our apparatus adaptively modifying the learning algorithm, which significantly simplifies setting a learning rate and results in guarantees that are provably as good as the best gradient-based algorithm, chosen in hindsight. We experimentally study the algorithms, showing that adaptive subgradient methods significantly outperform state-of-the-art, yet non-adaptive, subgradient algorithms. In the second part of the talk, I discuss a family of algorithms that incorporate randomization to smooth the problem being solved. Rather than take advantage of the geometry of the data, these techniques take advantage of the stochasticity inherent in online learning and statistical optimization problems. Our randomization techniques directly lead to easily parallelizable algorithms, and we provide guarantees on the convergence of the resulting optimization procedures. These guarantees are order optimal in both the amount of parallelization and number of samples available to the algorithm, and to the best of our knowledge, these are the first such guarantees. We conclude with experimental results that demonstrate the effectiveness of the proposed algorithms.
Open to: Public