fix Nesterov typo found by @bamos

author: Evan Shelhamer <shelhamer@imaginarynumber.net> 2015-02-04 10:15:21 -0800
committer: Evan Shelhamer <shelhamer@imaginarynumber.net> 2015-02-04 10:15:21 -0800
commit: 648aed72acf1c506009ddb33d8cace40b75e176e (patch)
tree: 88d68d9a3f81e873ebdcf61f7ee0b09cdb545c80 /docs/tutorial
parent: b6f9dc8f864ebb9cd63398da4de83493e81b5b54 (diff)
download: caffeonacl-648aed72acf1c506009ddb33d8cace40b75e176e.tar.gz
caffeonacl-648aed72acf1c506009ddb33d8cace40b75e176e.tar.bz2
caffeonacl-648aed72acf1c506009ddb33d8cace40b75e176e.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/tutorial/solver.md b/docs/tutorial/solver.md
index 8884ea0e..17f793ef 100644
--- a/docs/tutorial/solver.md
+++ b/docs/tutorial/solver.md
@@ -6,7 +6,7 @@ title: Solver / Model Optimization
 The solver orchestrates model optimization by coordinating the network's forward inference and backward gradients to form parameter updates that attempt to improve the loss.
 The responsibilities of learning are divided between the Solver for overseeing the optimization and generating parameter updates and the Net for yielding loss and gradients.
 
-The Caffe solvers are Stochastic Gradient Descent (SGD), Adaptive Gradient (ADAGRAD), and Nesterov's Accelerated Gradient (NAG).
+The Caffe solvers are Stochastic Gradient Descent (SGD), Adaptive Gradient (ADAGRAD), and Nesterov's Accelerated Gradient (NESTEROV).
 
 The solver
 
@@ -126,7 +126,7 @@ Note that in practice, for weights $$ W \in \mathcal{R}^d $$, AdaGrad implementa
 
 ### NAG
 
-**Nesterov's accelerated gradient** (`solver_type: NAG`) was proposed by Nesterov [1] as an "optimal" method of convex optimization, achieving a convergence rate of $$ \mathcal{O}(1/t^2) $$ rather than the $$ \mathcal{O}(1/t) $$.
+**Nesterov's accelerated gradient** (`solver_type: NESTEROV`) was proposed by Nesterov [1] as an "optimal" method of convex optimization, achieving a convergence rate of $$ \mathcal{O}(1/t^2) $$ rather than the $$ \mathcal{O}(1/t) $$.
 Though the required assumptions to achieve the $$ \mathcal{O}(1/t^2) $$ convergence typically will not hold for deep networks trained with Caffe (e.g., due to non-smoothness and non-convexity), in practice NAG can be a very effective method for optimizing certain types of deep learning architectures, as demonstrated for deep MNIST autoencoders by Sutskever et al. [2].
 
 The weight update formulas look very similar to the SGD updates given above:
author	Evan Shelhamer <shelhamer@imaginarynumber.net>	2015-02-04 10:15:21 -0800
committer	Evan Shelhamer <shelhamer@imaginarynumber.net>	2015-02-04 10:15:21 -0800
commit	648aed72acf1c506009ddb33d8cace40b75e176e (patch)
tree	88d68d9a3f81e873ebdcf61f7ee0b09cdb545c80 /docs/tutorial
parent	b6f9dc8f864ebb9cd63398da4de83493e81b5b54 (diff)
download	caffeonacl-648aed72acf1c506009ddb33d8cace40b75e176e.tar.gz caffeonacl-648aed72acf1c506009ddb33d8cace40b75e176e.tar.bz2 caffeonacl-648aed72acf1c506009ddb33d8cace40b75e176e.zip