- 博客(0)
- 资源 (4)
- 收藏
- 关注
Training Neural Networks without Gradients
With the growing importance of large network
models and enormous training datasets, GPUs
have become increasingly necessary to train neural
networks. This is largely because conventional
optimization algorithms rely on stochastic
gradient methods that don’t scale well to large
numbers of cores in a cluster setting. Furthermore,
the convergence of all gradient methods,
including batch methods, suffers from common
problems like saturation effects, poor conditioning,
and saddle points. This paper explores an
unconventional training method that uses alternating
direction methods and Bregman iteration
to train networks without gradient descent steps.
The proposed method reduces the network training
problem to a sequence of minimization substeps
that can each be solved globally in closed
form. The proposed method is advantageous because
it avoids many of the caveats that make
gradient methods slow on highly non-convex
problems. The method exhibits strong scaling in
the distributed setting, yielding linear speedups
even when split over thousands of cores.
2018-07-24
Designing and Building Parallel Programs
teach yourself how to program on a parallel machine
2017-10-15
Reinforcement Learning - An introduction
Reinforcement Learning - An introduction
The best book for reinforcement learning
2017-08-21
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人