summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEvan Shelhamer <shelhamer@imaginarynumber.net>2014-04-07 22:13:13 -0700
committerEvan Shelhamer <shelhamer@imaginarynumber.net>2014-04-07 22:13:13 -0700
commitefacd984131b4f3f06b2a0475e973dca244ba8bd (patch)
tree05ca62b40a775832ef2a7bde7abf993cb47d551c
parent9f638cbb66a4eb410a2a8c0ba98d6d92ffa7bd77 (diff)
parentb3cd950dd4ff2056d04935edd2aff0af3d1562cf (diff)
downloadcaffe-efacd984131b4f3f06b2a0475e973dca244ba8bd.tar.gz
caffe-efacd984131b4f3f06b2a0475e973dca244ba8bd.tar.bz2
caffe-efacd984131b4f3f06b2a0475e973dca244ba8bd.zip
Back-merge documentation and fixes
format installation docs, add links Add hdf5 requirements to 10.9 notes, drop cmake (not linked) fix im2col height/width bound check bug (issue #284 identified by @kmatzen) strip confusing confusing comment about shuffling files add /etc/rc.local hint for boot configuration of gpus Include k40 images per day benchmark drop caffe presentation in favor of dropbox link make build_docs.sh script work from anywhere proofread, fix dead link, standardize NVIDIA capitalization Added Link in index.md to perfomance_hardware.md Added Performance and Hardware Tips imagenet fix: ilvsrc -> ilsvrc
-rw-r--r--README.md4
-rw-r--r--docs/caffe-presentation.pdfbin1890722 -> 0 bytes
-rw-r--r--docs/index.md8
-rw-r--r--docs/installation.md43
-rw-r--r--docs/performance_hardware.md57
-rw-r--r--examples/imagenet/imagenet_train.prototxt2
-rw-r--r--examples/imagenet/imagenet_val.prototxt2
-rwxr-xr-xscripts/build_docs.sh13
-rw-r--r--tools/convert_imageset.cpp1
9 files changed, 94 insertions, 36 deletions
diff --git a/README.md b/README.md
index b9718595..6b45624f 100644
--- a/README.md
+++ b/README.md
@@ -12,8 +12,8 @@ parameters in the code. Python and Matlab wrappers are provided.
At the same time, Caffe fits industry needs, with blazing fast C++/Cuda code for
GPU computation. Caffe is currently the fastest GPU CNN implementation publicly
-available, and is able to process more than **20 million images per day** on a
-single Tesla K20 machine \*.
+available, and is able to process more than **40 million images per day** on a
+single NVIDIA K40 GPU (or 20 million per day on a K20)\*.
Caffe also provides **seamless switching between CPU and GPU**, which allows one
to train models with fast GPUs and then deploy them on non-GPU clusters with one
diff --git a/docs/caffe-presentation.pdf b/docs/caffe-presentation.pdf
deleted file mode 100644
index 184368ee..00000000
--- a/docs/caffe-presentation.pdf
+++ /dev/null
Binary files differ
diff --git a/docs/index.md b/docs/index.md
index 98c266c6..a67cae4b 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -16,10 +16,10 @@ Caffe aims to provide computer vision scientists and practitioners with a **clea
For example, network structure is easily specified in separate config files, with no mess of hard-coded parameters in the code.
At the same time, Caffe fits industry needs, with blazing fast C++/CUDA code for GPU computation.
-Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **20 million images per day** on a single Tesla K20 machine \*.
+Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **40 million images per day** with a single NVIDIA K40 or Titan GPU (or 20 million images per day on a K20 GPU)\*. That's 192 images per second during training and 500 images per second during test.
Caffe also provides **seamless switching between CPU and GPU**, which allows one to train models with fast GPUs and then deploy them on non-GPU clusters with one line of code: `Caffe::set_mode(Caffe::CPU)`.
-Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode.
+Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode. While in GPU mode, computing predictions on an image takes only 2 ms when images are processed in batch mode.
## Documentation
@@ -55,7 +55,7 @@ Please kindly cite Caffe in your publications if it helps your research:
### Acknowledgements
-Yangqing would like to thank the NVidia Academic program for providing K20 GPUs, and [Oriol Vinyals](http://www1.icsi.berkeley.edu/~vinyals/) for various discussions along the journey.
+Yangqing would like to thank the NVIDIA Academic program for providing K20 GPUs, and [Oriol Vinyals](http://www1.icsi.berkeley.edu/~vinyals/) for various discussions along the journey.
A core set of BVLC members have contributed lots of new functionality and fixes since the original release (alphabetical by first name):
@@ -74,4 +74,4 @@ If you'd like to contribute, read [this](development.html).
---
-\*: When measured with the [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model that won the ImageNet Large Scale Visual Recognition Challenge 2012.
+\*: When measured with the [SuperVision](http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf) model that won the ImageNet Large Scale Visual Recognition Challenge 2012. See [performance and hardware configuration details](/performance_hardware.html).
diff --git a/docs/installation.md b/docs/installation.md
index 02423a24..a245dd7e 100644
--- a/docs/installation.md
+++ b/docs/installation.md
@@ -27,21 +27,21 @@ The following sections detail prerequisites and installation on Ubuntu. For OS X
## Prerequisites
-* CUDA (5.0 or 5.5)
-* Boost
-* MKL (but see the [boost-eigen branch](https://github.com/BVLC/caffe/tree/boost-eigen) for a boost/Eigen3 port)
-* OpenCV
+* [CUDA](https://developer.nvidia.com/cuda-zone) 5.0 or 5.5
+* [boost](http://www.boost.org/) (1.55 preferred)
+* [BLAS](http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) by [MKL](http://software.intel.com/en-us/intel-mkl) (though the `dev` branch supports ATLAS as an alternative)
+* [OpenCV](http://opencv.org/)
* glog, gflags, protobuf, leveldb, snappy, hdf5
-* For the Python wrapper: python, numpy (>= 1.7 preferred), and boost_python
-* For the Matlab wrapper: Matlab with mex
+* For the python wrapper: python, numpy (>= 1.7 preferred), and boost_python
+* For the MATLAB wrapper: MATLAB with mex
-Caffe requires the CUDA NVCC compiler to compile its GPU code. To install CUDA, go to the [NVidia CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Caffe is verified to compile with both CUDA 5.0 and 5.5.
+**CUDA**: Caffe requires the CUDA NVCC compiler to compile its GPU code. To install CUDA, go to the [NVIDIA CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Caffe compiles with both CUDA 5.0 and 5.5.
N.B. one can install the CUDA libraries without the CUDA driver in order to build and run Caffe in CPU-only mode.
-Caffe also needs Intel MKL as the backend of its matrix computation and vectorized computations. We are in the process of removing MKL dependency, but for now you will need to have an MKL installation. You can obtain a [trial license](http://software.intel.com/en-us/intel-mkl) or an [academic license](http://software.intel.com/en-us/intel-education-offerings) (if you are a student).
+**MKL**: Caffe needs Intel MKL as the backend of its matrix and vector computations. We are working on support for alternative BLAS libraries, but for now you need to have MKL. You can obtain a [trial license](http://software.intel.com/en-us/intel-mkl) or an [academic license](http://software.intel.com/en-us/intel-education-offerings) (if you are a student).
-You will also need other packages, most of which can be installed via apt-get using:
+**The Rest**: you will also need other packages, most of which can be installed via apt-get using:
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev
@@ -56,28 +56,27 @@ The only exception being the google logging library, which does not exist in the
./configure
make && make install
-If you would like to compile the Python wrapper, you will need to install python, numpy and boost_python. You can either compile them from scratch or use a pre-packaged solution like [Anaconda](https://store.continuum.io/cshop/anaconda/) or [Enthought Canopy](https://www.enthought.com/products/canopy/). Note that if you use the Ubuntu default python, you will need to apt-install the `python-dev` package to have the python headers. You can install any remaining dependencies with
+**Python**: If you would like to have the python wrapper, install python, numpy and boost_python. You can either compile them from scratch or use a pre-packaged solution like [Anaconda](https://store.continuum.io/cshop/anaconda/) or [Enthought Canopy](https://www.enthought.com/products/canopy/). Note that if you use the Ubuntu default python, you will need to apt-install the `python-dev` package to have the python headers. You can install any remaining dependencies with
pip install -r /path/to/caffe/python/requirements.txt
-If you would like to compile the Matlab wrapper, you will need to install Matlab.
+**MATLAB**: if you would like to have the MATLAB wrapper, install MATLAB with the mex compiler.
-After setting all the prerequisites, you should modify the `Makefile.config` file and change the paths to those on your computer.
+Now that you have the prerequisites, edit your `Makefile.config` to change the paths for your setup.
## Compilation
-After installing the prerequisites, simply do `make all` to compile Caffe. If you would like to compile the Python and Matlab wrappers, do
+With the prerequisites installed, do `make all` to compile Caffe.
- make pycaffe
- make matcaffe
+To compile the python and MATLAB wrappers do `make pycaffe` and `make matcaffe` respectively.
-For a faster build, compile in parallel by doing `make all -j8` where 8 is the number of parallel threads for compilation. A good choice for the number of threads is the number of cores in your machine.
+*Distribution*: run `make distribute` to create a `distribute` directory with all the Caffe headers, compiled libraries, binaries, etc. needed for distribution to other machines.
-Optionally, you can run `make distribute` to create a `distribute` directory that contains all the necessary files, including the headers, compiled shared libraries, and binary files that you can distribute over different machines.
+*Speed*: for a faster build, compile in parallel by doing `make all -j8` where 8 is the number of parallel threads for compilation (a good choice for the number of threads is the number of cores in your machine).
-To use Caffe with python, you will need to add `/path/to/caffe/python` or `/path/to/caffe/build/python` to your `PYTHONPATH`.
+*Python Module*: for python support, you must add the compiled module to your `PYTHONPATH` (as `/path/to/caffe/python` or the like).
-Now that you have compiled Caffe, check out the [MNIST demo](mnist.html) and the pretrained [ImageNet example](imagenet.html).
+Now that you have installed Caffe, check out the [MNIST demo](mnist.html) and the pretrained [ImageNet example](imagenet.html).
## OS X Installation
@@ -94,7 +93,7 @@ Install [homebrew](http://brew.sh/) to install most of the prerequisites. Starti
brew install homebrew/science/hdf5
brew install homebrew/science/opencv
-Building boost from source is needed to link against your local python.
+Building boost from source is needed to link against your local python (exceptions might be raised during some OS X installs, but ignore these and continue).
If using homebrew python, python packages like `numpy` and `scipy` are best installed by doing `brew tap homebrew/python`, and then installing them with homebrew.
#### 10.9 additional notes
@@ -124,7 +123,7 @@ For each package that you install through homebrew do the following:
The prerequisite homebrew formulae are
- cmake boost snappy leveldb protobuf gflags glog homebrew/science/opencv
+ boost snappy leveldb protobuf gflags glog szip homebrew/science/hdf5 homebrew/science/opencv
so follow steps 1-4 for each.
@@ -132,7 +131,7 @@ After this the rest of the installation is the same as under 10.8, as long as `c
### CUDA and MKL
-CUDA and MKL are very straightforward to install; download from the NVIDIA and Intel websites.
+CUDA and MKL are straightforward to install; download from the NVIDIA and Intel links under "Prerequisites."
### Compiling Caffe
diff --git a/docs/performance_hardware.md b/docs/performance_hardware.md
new file mode 100644
index 00000000..7a08b8a5
--- /dev/null
+++ b/docs/performance_hardware.md
@@ -0,0 +1,57 @@
+---
+layout: default
+title: Caffe
+---
+
+# Performance and Hardware Configuration
+
+To measure performance on different NVIDIA GPUs we use the Caffe reference ImageNet model.
+
+For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified.
+
+**Acknowledgements**: BVLC members are very grateful to NVIDIA for providing several GPUs to conduct this research.
+
+## NVIDIA K40
+
+Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory.
+
+Best settings with ECC off and maximum clock speed:
+
+* Training is 26.5 secs / 20 iterations (5,120 images)
+* Testing is 100 secs / validation set (50,000 images)
+
+Other settings:
+
+* ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set
+* ECC on, default speed: training 31 secs / 20 iterations, test 117 secs / validation set
+* ECC off, default speed: training 31 secs / 20 iterations, test 118 secs / validation set
+
+### K40 configuration tips
+
+For maximum K40 performance, turn off ECC and boost the clock speed (at your own risk).
+
+To turn off ECC, do
+
+ sudo nvidia-smi -i 0 --ecc-config=0 # repeat with -i x for each GPU ID
+
+then reboot.
+
+Set the "persistence" mode of the GPU settings by
+
+ sudo nvidia-smi -pm 1
+
+and then set the clock speed with
+
+ sudo nvidia-smi -i 0 -ac 3004,875 # repeat with -i x for each GPU ID
+
+but note that this configuration resets across driver reloading / rebooting. Include these commands in a boot script to intialize these settings. For a simple fix, add these commands to `/etc/rc.local` (on Ubuntu).
+
+## NVIDIA Titan
+
+Training: 26.26 secs / 20 iterations (5,120 images).
+Testing: 100 secs / validation set (50,000 images).
+
+## NVIDIA K20
+
+Training: 36.0 secs / 20 iterations (5,120 images).
+Testing: 133 secs / validation set (50,000 images)
diff --git a/examples/imagenet/imagenet_train.prototxt b/examples/imagenet/imagenet_train.prototxt
index b34a9b49..519d4509 100644
--- a/examples/imagenet/imagenet_train.prototxt
+++ b/examples/imagenet/imagenet_train.prototxt
@@ -5,7 +5,7 @@ layers {
top: "data"
top: "label"
data_param {
- source: "ilvsrc12_train_leveldb"
+ source: "ilsvrc12_train_leveldb"
mean_file: "../../data/ilsvrc12/imagenet_mean.binaryproto"
batch_size: 256
crop_size: 227
diff --git a/examples/imagenet/imagenet_val.prototxt b/examples/imagenet/imagenet_val.prototxt
index 2f1ead7c..dd26f40e 100644
--- a/examples/imagenet/imagenet_val.prototxt
+++ b/examples/imagenet/imagenet_val.prototxt
@@ -5,7 +5,7 @@ layers {
top: "data"
top: "label"
data_param {
- source: "ilvsrc12_val_leveldb"
+ source: "ilsvrc12_val_leveldb"
mean_file: "../../data/ilsvrc12/imagenet_mean.binaryproto"
batch_size: 50
crop_size: 227
diff --git a/scripts/build_docs.sh b/scripts/build_docs.sh
index 1d4f16a4..1faf8bd5 100755
--- a/scripts/build_docs.sh
+++ b/scripts/build_docs.sh
@@ -1,8 +1,11 @@
#!/bin/bash
+PORT=${1:-4000}
+
echo "usage: build_docs.sh [port]"
-PORT=4000
-if [ $# -gt 0 ]; then
- PORT=$1
-fi
-jekyll serve -w -s docs/ -d docs/_site --port $PORT
+
+# Find the docs dir, no matter where the script is called
+DIR="$( cd "$(dirname "$0")" ; pwd -P )"
+cd $DIR/../docs
+
+jekyll serve -w -s . -d _site --port=$PORT
diff --git a/tools/convert_imageset.cpp b/tools/convert_imageset.cpp
index 82edae49..2d4d4c77 100644
--- a/tools/convert_imageset.cpp
+++ b/tools/convert_imageset.cpp
@@ -9,7 +9,6 @@
// ....
// if the last argument is 1, a random shuffle will be carried out before we
// process the file lines.
-// You are responsible for shuffling the files yourself.
#include <glog/logging.h>
#include <leveldb/db.h>