diff options
-rw-r--r-- | include/caffe/layers/batch_norm_layer.hpp | 23 |
1 files changed, 12 insertions, 11 deletions
diff --git a/include/caffe/layers/batch_norm_layer.hpp b/include/caffe/layers/batch_norm_layer.hpp index c38c8410..a26ad1a4 100644 --- a/include/caffe/layers/batch_norm_layer.hpp +++ b/include/caffe/layers/batch_norm_layer.hpp @@ -13,18 +13,19 @@ namespace caffe { * @brief Normalizes the input to have 0-mean and/or unit (1) variance across * the batch. * - * This layer computes Batch Normalization described in [1]. For - * each channel in the data (i.e. axis 1), it subtracts the mean and divides - * by the variance, where both statistics are computed across both spatial - * dimensions and across the different examples in the batch. + * This layer computes Batch Normalization as described in [1]. For each channel + * in the data (i.e. axis 1), it subtracts the mean and divides by the variance, + * where both statistics are computed across both spatial dimensions and across + * the different examples in the batch. * - * By default, during training time, the network is computing global mean/ - * variance statistics via a running average, which is then used at test - * time to allow deterministic outputs for each input. You can manually - * toggle whether the network is accumulating or using the statistics via the - * use_global_stats option. IMPORTANT: for this feature to work, you MUST - * set the learning rate to zero for all three parameter blobs, i.e., - * param {lr_mult: 0} three times in the layer definition. + * By default, during training time, the network is computing global + * mean/variance statistics via a running average, which is then used at test + * time to allow deterministic outputs for each input. You can manually toggle + * whether the network is accumulating or using the statistics via the + * use_global_stats option. IMPORTANT: for this feature to work, you MUST set + * the learning rate to zero for all three blobs, i.e., param {lr_mult: 0} three + * times in the layer definition. For reference, these three blobs are (0) + * mean, (1) variance, and (2) the moving average factor. * * Note that the original paper also included a per-channel learned bias and * scaling factor. To implement this in Caffe, define a `ScaleLayer` configured |