arm_compute v18.03

Change-Id: I8f9a2a9d32a6cab019b8504d313216f28671f9f5
author: Jenkins <bsgcomp@matben01-vm-clbuild1.eu.iaas.arm.com> 2018-03-02 12:38:09 +0000
committer: Anthony Barbier <anthony.barbier@arm.com> 2018-03-02 15:37:57 +0000
commit: c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb (patch)
tree: 4bd397554f4a784e1fbb1e1eb214cb958e8f83df /docs/01_library.dox
parent: 06ea048f062a50404b1b3998a61a45449c2d1f0f (diff)
download: armcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.tar.gz
armcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.tar.bz2
armcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.zip
1 files changed, 17 insertions, 0 deletions
diff --git a/docs/01_library.dox b/docs/01_library.dox
index 20d057c2c..e3f673df8 100644
--- a/docs/01_library.dox
+++ b/docs/01_library.dox
@@ -366,5 +366,22 @@ mm->finalize();                // Finalize memory manager (Object lifetime check
 conv1.run();
 conv2.run();
 @endcode
+
+@section S4_8_opencl_tuner OpenCL Tuner
+
+OpenCL kernels when dispatched to the GPU take two arguments:
+- The Global Workgroup Size (GWS): That's the number of times to run an OpenCL kernel to process all the elements we want to process.
+- The Local Workgroup Size (LWS): That's the number of elements we want to run in parallel on a GPU core at a given point in time.
+
+The LWS can be required by an algorithm (For example if it contains memory barriers or uses local memory) but it can also be used for performance reasons to tweak the performance of a kernel: the execution time of the overall kernel might vary significantly depending on how the GWS is broken down.
+
+However, there is no universal rule regarding which LWS is best for a given kernel, so instead we created the @ref CLTuner.
+
+When the @ref CLTuner is enabled ( Target = 2 for the graph examples), the first time an OpenCL kernel is executed the Compute Library will try to run it with a variety of LWS values and will remember which one performed best for subsequent runs. At the end of the run the @ref graph::Graph will try to save these tuning parameters to a file.
+
+However this process takes quite a lot of time, which is why it cannot be enabled all the time.
+
+But, when the @ref CLTuner is disabled ( Target = 1 for the graph examples), the @ref graph::Graph will try to reload the file containing the tuning parameters, then for each executed kernel the Compute Library will use the fine tuned LWS if it was present in the file or use a default LWS value if it's not.
+
 */
 } // namespace arm_compute
author	Jenkins <bsgcomp@matben01-vm-clbuild1.eu.iaas.arm.com>	2018-03-02 12:38:09 +0000
committer	Anthony Barbier <anthony.barbier@arm.com>	2018-03-02 15:37:57 +0000
commit	c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb (patch)
tree	4bd397554f4a784e1fbb1e1eb214cb958e8f83df /docs/01_library.dox
parent	06ea048f062a50404b1b3998a61a45449c2d1f0f (diff)
download	armcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.tar.gz armcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.tar.bz2 armcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.zip