summaryrefslogtreecommitdiff
path: root/docs/01_library.dox
diff options
context:
space:
mode:
authorJenkins <bsgcomp@matben01-vm-clbuild1.eu.iaas.arm.com>2018-03-02 12:38:09 +0000
committerAnthony Barbier <anthony.barbier@arm.com>2018-03-02 15:37:57 +0000
commitc3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb (patch)
tree4bd397554f4a784e1fbb1e1eb214cb958e8f83df /docs/01_library.dox
parent06ea048f062a50404b1b3998a61a45449c2d1f0f (diff)
downloadarmcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.tar.gz
armcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.tar.bz2
armcl-c3f34a43ffb5d52ee4a4e9f7b1bf4c6c002aeebb.zip
arm_compute v18.03
Change-Id: I8f9a2a9d32a6cab019b8504d313216f28671f9f5
Diffstat (limited to 'docs/01_library.dox')
-rw-r--r--docs/01_library.dox17
1 files changed, 17 insertions, 0 deletions
diff --git a/docs/01_library.dox b/docs/01_library.dox
index 20d057c2c..e3f673df8 100644
--- a/docs/01_library.dox
+++ b/docs/01_library.dox
@@ -366,5 +366,22 @@ mm->finalize(); // Finalize memory manager (Object lifetime check
conv1.run();
conv2.run();
@endcode
+
+@section S4_8_opencl_tuner OpenCL Tuner
+
+OpenCL kernels when dispatched to the GPU take two arguments:
+- The Global Workgroup Size (GWS): That's the number of times to run an OpenCL kernel to process all the elements we want to process.
+- The Local Workgroup Size (LWS): That's the number of elements we want to run in parallel on a GPU core at a given point in time.
+
+The LWS can be required by an algorithm (For example if it contains memory barriers or uses local memory) but it can also be used for performance reasons to tweak the performance of a kernel: the execution time of the overall kernel might vary significantly depending on how the GWS is broken down.
+
+However, there is no universal rule regarding which LWS is best for a given kernel, so instead we created the @ref CLTuner.
+
+When the @ref CLTuner is enabled ( Target = 2 for the graph examples), the first time an OpenCL kernel is executed the Compute Library will try to run it with a variety of LWS values and will remember which one performed best for subsequent runs. At the end of the run the @ref graph::Graph will try to save these tuning parameters to a file.
+
+However this process takes quite a lot of time, which is why it cannot be enabled all the time.
+
+But, when the @ref CLTuner is disabled ( Target = 1 for the graph examples), the @ref graph::Graph will try to reload the file containing the tuning parameters, then for each executed kernel the Compute Library will use the fine tuned LWS if it was present in the file or use a default LWS value if it's not.
+
*/
} // namespace arm_compute