[example] edit fine-tuning and train on ~2000 images, 1557 / 382 split

- further detail merits of fine-tuning: less starving for itme and data - set random seed for reproducing the tutorial - 1557 train / 382 test split is more indicative of training quality than splits of 200 images
author: Evan Shelhamer <shelhamer@imaginarynumber.net> 2014-08-28 21:45:46 -0700
committer: Evan Shelhamer <shelhamer@imaginarynumber.net> 2014-08-28 22:37:25 -0700
commit: 23a50260846f3fd4f45f81a70ab5c837a4ed0b40 (patch)
tree: 82cf528d86a8613dd560616af8c43cf4a279ca96 /examples
parent: 400411bf4893c97ca5371c26ecdaccc278fcf3a4 (diff)
download: caffeonacl-23a50260846f3fd4f45f81a70ab5c837a4ed0b40.tar.gz
caffeonacl-23a50260846f3fd4f45f81a70ab5c837a4ed0b40.tar.bz2
caffeonacl-23a50260846f3fd4f45f81a70ab5c837a4ed0b40.zip
7 files changed, 162 insertions, 102 deletions
diff --git a/examples/finetuning_on_flickr_style/assemble_data.py b/examples/finetune_flickr_style/assemble_data.py
index d8770e92..b4c995e8 100644..100755
--- a/examples/finetuning_on_flickr_style/assemble_data.py
+++ b/examples/finetune_flickr_style/assemble_data.py
@@ -1,3 +1,4 @@
+#!/usr/bin/env python
 """
 Form a subset of the Flickr Style data, download images to dirname, and write
 Caffe ImagesDataLayer training file.
diff --git a/examples/finetuning_on_flickr_style/flickr_style.csv.gz b/examples/finetune_flickr_style/flickr_style.csv.gz
index 5a84f88a..5a84f88a 100644
--- a/examples/finetuning_on_flickr_style/flickr_style.csv.gz
+++ b/examples/finetune_flickr_style/flickr_style.csv.gz
diff --git a/examples/finetuning_on_flickr_style/solver.prototxt b/examples/finetune_flickr_style/flickr_style_solver.prototxt
index ed1548c0..740ec39f 100644
--- a/examples/finetuning_on_flickr_style/solver.prototxt
+++ b/examples/finetune_flickr_style/flickr_style_solver.prototxt
@@ -1,4 +1,4 @@
-net: "examples/finetuning_on_flickr_style/train_val.prototxt"
+net: "examples/finetune_flickr_style/flickr_style_train_val.prototxt"
 test_iter: 100
 test_interval: 1000
 # lr for fine-tuning should be lower than when starting from scratch
@@ -12,4 +12,4 @@ max_iter: 100000
 momentum: 0.9
 weight_decay: 0.0005
 snapshot: 10000
-snapshot_prefix: "examples/finetuning_on_flickr_style/models/finetuning"
+snapshot_prefix: "examples/finetune_flickr_style/flickr_style"
diff --git a/examples/finetuning_on_flickr_style/train_val.prototxt b/examples/finetune_flickr_style/flickr_style_train_val.prototxt
index bcb1e1ce..bcb1e1ce 100644
--- a/examples/finetuning_on_flickr_style/train_val.prototxt
+++ b/examples/finetune_flickr_style/flickr_style_train_val.prototxt
diff --git a/examples/finetune_flickr_style/readme.md b/examples/finetune_flickr_style/readme.md
new file mode 100644
index 00000000..82249982
--- /dev/null
+++ b/examples/finetune_flickr_style/readme.md
@@ -0,0 +1,159 @@
+---
+title: Fine-tuning for style recognition
+description: Fine-tune the ImageNet-trained CaffeNet on the "Flickr Style" dataset.
+category: example
+include_in_docs: true
+layout: default
+priority: 5
+---
+
+# Fine-tuning CaffeNet for Style Recognition on "Flickr Style" Data
+
+Fine-tuning takes an already learned model, adapts the architecture, and resumes training from the already learned model weights.
+Let's fine-tune the BVLC-distributed CaffeNet model on a different dataset, [Flickr Style](http://sergeykarayev.com/files/1311.3715v3.pdf), to predict image style instead of object category.
+
+## Explanation
+
+The Flickr-sourced images of the Style dataset are visually very similar to the ImageNet dataset, on which the `caffe_reference_imagenet_model` was trained.
+Since that model works well for object category classification, we'd like to use it architecture for our style classifier.
+We also only have 80,000 images to train on, so we'd like to start with the parameters learned on the 1,000,000 ImageNet images, and fine-tune as needed.
+If we give provide the `weights` argument to the `caffe train` command, the pretrained weights will be loaded into our model, matching layers by name.
+
+Because we are predicting 20 classes instead of a 1,000, we do need to change the last layer in the model.
+Therefore, we change the name of the last layer from `fc8` to `fc8_flickr` in our prototxt.
+Since there is no layer named that in the `caffe_reference_imagenet_model`, that layer will begin training with random weights.
+
+We will also decrease the overall learning rate `base_lr` in the solver prototxt, but boost the `blobs_lr` on the newly introduced layer.
+The idea is to have the rest of the model change very slowly with new data, but let the new layer learn fast.
+Additionally, we set `stepsize` in the solver to a lower value than if we were training from scratch, since we're virtually far along in training and therefore want the learning rate to go down faster.
+Note that we could also entirely prevent fine-tuning of all layers other than `fc8_flickr` by setting their `blobs_lr` to 0.
+
+## Procedure
+
+All steps are to be done from the caffe root directory.
+
+The dataset is distributed as a list of URLs with corresponding labels.
+Using a script, we will download a small subset of the data and split it into train and val sets.
+
+    caffe % ./examples/finetune_flickr_style/assemble_data.py -h
+    usage: assemble_data.py [-h] [-s SEED] [-i IMAGES] [-w WORKERS]
+
+    Download a subset of Flickr Style to a directory
+
+    optional arguments:
+      -h, --help            show this help message and exit
+      -s SEED, --seed SEED  random seed
+      -i IMAGES, --images IMAGES
+                            number of images to use (-1 for all)
+      -w WORKERS, --workers WORKERS
+                            num workers used to download images. -x uses (all - x)
+                            cores.
+
+    caffe % python examples/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed 831486
+    Downloading 2000 images with 7 workers...
+    Writing train/val for 1939 successfully downloaded images.
+
+This script downloads images and writes train/val file lists into `data/flickr_style`.
+With this random seed there are 1,557 train images and 382 test images.
+The prototxts in this example assume this, and also assume the presence of the ImageNet mean file (run `get_ilsvrc_aux.sh` from `data/ilsvrc12` to obtain this if you haven't yet).
+
+We'll also need the ImageNet-trained model, which you can obtain by running `get_caffe_reference_imagenet_model.sh` from `examples/imagenet`.
+
+Now we can train! (You can fine-tune in CPU mode by leaving out the `-gpu` flag.)
+
+    caffe % ./build/tools/caffe train -solver examples/finetune_flickr_style/flickr_style_solver.prototxt -weights examples/imagenet/caffe_reference_imagenet_model -gpu 0
+
+    [...]
+
+    I0828 22:10:04.025378  9718 solver.cpp:46] Solver scaffolding done.
+    I0828 22:10:04.025388  9718 caffe.cpp:95] Use GPU with device ID 0
+    I0828 22:10:04.192004  9718 caffe.cpp:107] Finetuning from examples/imagenet/caffe_reference_imagenet_model
+
+    [...]
+
+    I0828 22:17:48.338963 11510 solver.cpp:165] Solving FlickrStyleCaffeNet
+    I0828 22:17:48.339010 11510 solver.cpp:251] Iteration 0, Testing net (#0)
+    I0828 22:18:14.313817 11510 solver.cpp:302]     Test net output #0: accuracy = 0.0416
+    I0828 22:18:14.476822 11510 solver.cpp:195] Iteration 0, loss = 3.75717
+    I0828 22:18:14.476878 11510 solver.cpp:397] Iteration 0, lr = 0.001
+    I0828 22:18:19.700408 11510 solver.cpp:195] Iteration 20, loss = 3.1689
+    I0828 22:18:19.700461 11510 solver.cpp:397] Iteration 20, lr = 0.001
+    I0828 22:18:24.924685 11510 solver.cpp:195] Iteration 40, loss = 2.3549
+    I0828 22:18:24.924741 11510 solver.cpp:397] Iteration 40, lr = 0.001
+    I0828 22:18:30.114858 11510 solver.cpp:195] Iteration 60, loss = 2.74191
+    I0828 22:18:30.114910 11510 solver.cpp:397] Iteration 60, lr = 0.001
+    I0828 22:18:35.328071 11510 solver.cpp:195] Iteration 80, loss = 1.9147
+    I0828 22:18:35.328127 11510 solver.cpp:397] Iteration 80, lr = 0.001
+    I0828 22:18:40.588317 11510 solver.cpp:195] Iteration 100, loss = 1.81419
+    I0828 22:18:40.588373 11510 solver.cpp:397] Iteration 100, lr = 0.001
+    I0828 22:18:46.171576 11510 solver.cpp:195] Iteration 120, loss = 2.02105
+    I0828 22:18:46.171669 11510 solver.cpp:397] Iteration 120, lr = 0.001
+    I0828 22:18:51.757809 11510 solver.cpp:195] Iteration 140, loss = 1.49083
+    I0828 22:18:51.757863 11510 solver.cpp:397] Iteration 140, lr = 0.001
+    I0828 22:18:57.345080 11510 solver.cpp:195] Iteration 160, loss = 1.35319
+    I0828 22:18:57.345135 11510 solver.cpp:397] Iteration 160, lr = 0.001
+    I0828 22:19:02.928794 11510 solver.cpp:195] Iteration 180, loss = 1.11658
+    I0828 22:19:02.928850 11510 solver.cpp:397] Iteration 180, lr = 0.001
+    I0828 22:19:08.514497 11510 solver.cpp:195] Iteration 200, loss = 1.08851
+    I0828 22:19:08.514552 11510 solver.cpp:397] Iteration 200, lr = 0.001
+
+    [...]
+
+    I0828 22:22:40.789010 11510 solver.cpp:195] Iteration 960, loss = 0.0844627
+    I0828 22:22:40.789175 11510 solver.cpp:397] Iteration 960, lr = 0.001
+    I0828 22:22:46.376626 11510 solver.cpp:195] Iteration 980, loss = 0.0110937
+    I0828 22:22:46.376682 11510 solver.cpp:397] Iteration 980, lr = 0.001
+    I0828 22:22:51.687258 11510 solver.cpp:251] Iteration 1000, Testing net (#0)
+    I0828 22:23:17.438894 11510 solver.cpp:302]     Test net output #0: accuracy = 1
+
+Note how rapidly the loss went down. Although the 100% accuracy is optimistic, it is evidence the model is learning quickly and well.
+
+For comparison, here is how the loss goes down when we do not start with a pre-trained model:
+
+    I0828 22:24:18.624004 12919 solver.cpp:165] Solving FlickrStyleCaffeNet
+    I0828 22:24:18.624099 12919 solver.cpp:251] Iteration 0, Testing net (#0)
+    I0828 22:24:44.520992 12919 solver.cpp:302]     Test net output #0: accuracy = 0.045
+    I0828 22:24:44.676905 12919 solver.cpp:195] Iteration 0, loss = 3.33111
+    I0828 22:24:44.677120 12919 solver.cpp:397] Iteration 0, lr = 0.001
+    I0828 22:24:50.152454 12919 solver.cpp:195] Iteration 20, loss = 2.98133
+    I0828 22:24:50.152509 12919 solver.cpp:397] Iteration 20, lr = 0.001
+    I0828 22:24:55.736256 12919 solver.cpp:195] Iteration 40, loss = 3.02124
+    I0828 22:24:55.736311 12919 solver.cpp:397] Iteration 40, lr = 0.001
+    I0828 22:25:01.316514 12919 solver.cpp:195] Iteration 60, loss = 2.99509
+    I0828 22:25:01.316567 12919 solver.cpp:397] Iteration 60, lr = 0.001
+    I0828 22:25:06.899554 12919 solver.cpp:195] Iteration 80, loss = 2.9928
+    I0828 22:25:06.899610 12919 solver.cpp:397] Iteration 80, lr = 0.001
+    I0828 22:25:12.484624 12919 solver.cpp:195] Iteration 100, loss = 2.99072
+    I0828 22:25:12.484678 12919 solver.cpp:397] Iteration 100, lr = 0.001
+    I0828 22:25:18.069056 12919 solver.cpp:195] Iteration 120, loss = 3.01816
+    I0828 22:25:18.069149 12919 solver.cpp:397] Iteration 120, lr = 0.001
+    I0828 22:25:23.650928 12919 solver.cpp:195] Iteration 140, loss = 2.9694
+    I0828 22:25:23.650984 12919 solver.cpp:397] Iteration 140, lr = 0.001
+    I0828 22:25:29.235535 12919 solver.cpp:195] Iteration 160, loss = 3.00383
+    I0828 22:25:29.235589 12919 solver.cpp:397] Iteration 160, lr = 0.001
+    I0828 22:25:34.816898 12919 solver.cpp:195] Iteration 180, loss = 2.99802
+    I0828 22:25:34.816953 12919 solver.cpp:397] Iteration 180, lr = 0.001
+    I0828 22:25:40.396656 12919 solver.cpp:195] Iteration 200, loss = 2.99769
+    I0828 22:25:40.396711 12919 solver.cpp:397] Iteration 200, lr = 0.001
+
+    [...]
+
+    I0828 22:29:12.539094 12919 solver.cpp:195] Iteration 960, loss = 2.99314
+    I0828 22:29:12.539258 12919 solver.cpp:397] Iteration 960, lr = 0.001
+    I0828 22:29:18.123092 12919 solver.cpp:195] Iteration 980, loss = 2.99503
+    I0828 22:29:18.123147 12919 solver.cpp:397] Iteration 980, lr = 0.001
+    I0828 22:29:23.432059 12919 solver.cpp:251] Iteration 1000, Testing net (#0)
+    I0828 22:29:49.409044 12919 solver.cpp:302]     Test net output #0: accuracy = 0.0624
+
+This model is only beginning to learn.
+
+Fine-tuning can be feasible when training from scratch would not be for lack of time or data.
+Even in CPU mode each pass through the training set takes ~100 s. GPU fine-tuning is of course faster still and can learn a useful model in minutes or hours instead of days or weeks.
+Furthermore, note that the model has only trained on < 2,000 instances. Transfer learning a new task like style recognition from the ImageNet pretraining can require much less data than training from scratch.
+Now try fine-tuning to your own tasks and data!
+
+## License
+
+The Flickr Style dataset as distributed here contains only URLs to images.
+Some of the images may have copyright.
+Training a category-recognition model for research/non-commercial use may constitute fair use of this data.
diff --git a/examples/finetuning_on_flickr_style/models/.gitignore b/examples/finetuning_on_flickr_style/models/.gitignore
deleted file mode 100644
index e69de29b..00000000
--- a/examples/finetuning_on_flickr_style/models/.gitignore
+++ /dev/null
diff --git a/examples/finetuning_on_flickr_style/readme.md b/examples/finetuning_on_flickr_style/readme.md
deleted file mode 100644
index 4a164e5f..00000000
--- a/examples/finetuning_on_flickr_style/readme.md
+++ /dev/null
@@ -1,100 +0,0 @@
----
-title: Fine-tuning CaffeNet on "Flickr Style" data
-description: We fine-tune the ImageNet-trained CaffeNet on another dataset.
-category: example
-include_in_docs: true
-layout: default
-priority: 5
----
-
-# Fine-tuning CaffeNet on "Flickr Style" data
-
-This example shows how to fine-tune the BVLC-distributed CaffeNet model on a different dataset: [Flickr Style](http://sergeykarayev.com/files/1311.3715v3.pdf), which has style category labels.
-
-## Explanation
-
-The Flickr-sourced data of the Style dataset is visually very similar to the ImageNet dataset, on which the `caffe_reference_imagenet_model` was trained.
-Since that model works well for object category classification, we'd like to use it architecture for our style classifier.
-We also only have 80,000 images to train on, so we'd like to start with the parameters learned on the 1,000,000 ImageNet images, and fine-tune as needed.
-If we give provide the `model` parameter to the `caffe train` command, the trained weights will be loaded into our model, matching layers by name.
-
-Because we are predicting 20 classes instead of a 1,000, we do need to change the last layer in the model.
-Therefore, we change the name of the last layer from `fc8` to `fc8_flickr` in our prototxt.
-Since there is no layer named that in the `caffe_reference_imagenet_model`, that layer will begin training with random weights.
-
-We will also decrease the overall learning rate `base_lr` in the solver prototxt, but boost the `blobs_lr` on the newly introduced layer.
-The idea is to have the rest of the model change very slowly with new data, but the new layer needs to learn fast.
-Additionally, we set `stepsize` in the solver to a lower value than if we were training from scratch, since we're virtually far along in training and therefore want the learning rate to go down faster.
-Note that we could also entirely prevent fine-tuning of all layers other than `fc8_flickr` by setting their `blobs_lr` to 0.
-
-## Procedure
-
-All steps are to be done from the root caffe directory.
-
-The dataset is distributed as a list of URLs with corresponding labels.
-Using a script, we will download a small subset of the data and split it into train and val sets.
-
-    caffe % python examples/finetuning_on_flickr_style/assemble_data.py -h
-    usage: assemble_data.py [-h] [-s SEED] [-i IMAGES] [-w WORKERS]
-
-    Download a subset of Flickr Style to a directory
-
-    optional arguments:
-      -h, --help            show this help message and exit
-      -s SEED, --seed SEED  random seed
-      -i IMAGES, --images IMAGES
-                            number of images to use (-1 for all)
-      -w WORKERS, --workers WORKERS
-                            num workers used to download images. -x uses (all - x)
-                            cores.
-
-    caffe % python examples/finetuning_on_flickr_style/assemble_data.py --workers=-1 --images=200
-    Downloading 200 images with 7 workers...
-    Writing train/val for 190 successfully downloaded images.
-
-This script downloads images and writes train/val file lists into `data/flickr_style`.
-The prototxt's in this example assume this, and also assume the presence of the ImageNet mean file (run `get_ilsvrc_aux.sh` from `data/ilsvrc12` to obtain this if you haven't yet).
-
-We'll also need the ImageNet-trained model, which you can obtain by running `get_caffe_reference_imagenet_model.sh` from `examples/imagenet`.
-
-Now we can train!
-
-    caffe % ./build/tools/caffe train -solver examples/finetuning_on_flickr_style/solver.prototxt -weights examples/imagenet/caffe_reference_imagenet_model
-    I0827 19:41:52.455621 2129298192 caffe.cpp:90] Starting Optimization
-    I0827 19:41:52.456883 2129298192 solver.cpp:32] Initializing solver from parameters:
-
-    [...]
-
-    I0827 19:41:55.520205 2129298192 solver.cpp:46] Solver scaffolding done.
-    I0827 19:41:55.520211 2129298192 caffe.cpp:99] Use CPU.
-    I0827 19:41:55.520217 2129298192 caffe.cpp:107] Finetuning from examples/imagenet/caffe_reference_imagenet_model
-    I0827 19:41:57.433044 2129298192 solver.cpp:165] Solving CaffeNet
-    I0827 19:41:57.433104 2129298192 solver.cpp:251] Iteration 0, Testing net (#0)
-    I0827 19:44:44.145447 2129298192 solver.cpp:302]     Test net output #0: accuracy = 0.004
-    I0827 19:44:48.774271 2129298192 solver.cpp:195] Iteration 0, loss = 3.46922
-    I0827 19:44:48.774333 2129298192 solver.cpp:397] Iteration 0, lr = 0.001
-    I0827 19:46:15.107447 2129298192 solver.cpp:195] Iteration 20, loss = 0.0147678
-    I0827 19:46:15.107511 2129298192 solver.cpp:397] Iteration 20, lr = 0.001
-    I0827 19:47:41.941119 2129298192 solver.cpp:195] Iteration 40, loss = 0.00455839
-    I0827 19:47:41.941181 2129298192 solver.cpp:397] Iteration 40, lr = 0.001
-
-Note how rapidly the loss went down.
-For comparison, here is how the loss goes down when we do not start with a pre-trained model:
-
-    I0827 18:57:08.496208 2129298192 solver.cpp:46] Solver scaffolding done.
-    I0827 18:57:08.496227 2129298192 caffe.cpp:99] Use CPU.
-    I0827 18:57:08.496235 2129298192 solver.cpp:165] Solving CaffeNet
-    I0827 18:57:08.496271 2129298192 solver.cpp:251] Iteration 0, Testing net (#0)
-    I0827 19:00:00.894336 2129298192 solver.cpp:302]     Test net output #0: accuracy = 0.075
-    I0827 19:00:05.825129 2129298192 solver.cpp:195] Iteration 0, loss = 3.51759
-    I0827 19:00:05.825187 2129298192 solver.cpp:397] Iteration 0, lr = 0.001
-    I0827 19:01:36.090224 2129298192 solver.cpp:195] Iteration 20, loss = 3.32227
-    I0827 19:01:36.091948 2129298192 solver.cpp:397] Iteration 20, lr = 0.001
-    I0827 19:03:08.522105 2129298192 solver.cpp:195] Iteration 40, loss = 2.97031
-    I0827 19:03:08.522176 2129298192 solver.cpp:397] Iteration 40, lr = 0.001
-
-## License
-
-The Flickr Style dataset as distributed here contains only URLs to images.
-Some of the images may have copyright.
-Training a category-recognition model for research/non-commercial use may constitute fair use of this data.
author	Evan Shelhamer <shelhamer@imaginarynumber.net>	2014-08-28 21:45:46 -0700
committer	Evan Shelhamer <shelhamer@imaginarynumber.net>	2014-08-28 22:37:25 -0700
commit	23a50260846f3fd4f45f81a70ab5c837a4ed0b40 (patch)
tree	82cf528d86a8613dd560616af8c43cf4a279ca96 /examples
parent	400411bf4893c97ca5371c26ecdaccc278fcf3a4 (diff)
download	caffeonacl-23a50260846f3fd4f45f81a70ab5c837a4ed0b40.tar.gz caffeonacl-23a50260846f3fd4f45f81a70ab5c837a4ed0b40.tar.bz2 caffeonacl-23a50260846f3fd4f45f81a70ab5c837a4ed0b40.zip