1 files changed, 27 insertions, 23 deletions
diff --git a/documentation/index.xhtml b/documentation/index.xhtml
index c9f1bd74f..4824ea7af 100644
--- a/documentation/index.xhtml
+++ b/documentation/index.xhtml
@@ -40,7 +40,7 @@
  <tr style="height: 56px;">
   <td style="padding-left: 0.5em;">
    <div id="projectname">ARM Compute Library
-   &#160;<span id="projectnumber">17.04</span>
+   &#160;<span id="projectnumber">17.05</span>
    </div>
   </td>
  </tr>
@@ -245,6 +245,13 @@ v17.04 (First release of April 2017)
 </pre><dl class="section note"><dt>Note</dt><dd>We're aiming at releasing one major public release with new features per quarter. All releases in between will only contain bug fixes.</dd></dl>
 <h2><a class="anchor" id="S2_2_changelog"></a>
 Changelog</h2>
+<p>v17.05 Public bug fixes release</p><ul>
+<li>Various bug fixes</li>
+<li>Remaining of the functions ported to use accurate padding.</li>
+<li>Library does not link against OpenCL anymore (It uses dlopen / dlsym at runtime instead to determine whether or not OpenCL is available).</li>
+<li>Added "free" method to allocator.</li>
+<li>Minimum version of G++ required for armv7 Linux changed from 4.8 to 4.9</li>
+</ul>
 <p>v17.04 Public bug fixes release The following functions have been ported to use the new accurate padding:</p><ul>
 <li><a class="el" href="classarm__compute_1_1_c_l_color_convert_kernel.xhtml">CLColorConvertKernel</a></li>
 <li><a class="el" href="classarm__compute_1_1_c_l_edge_non_max_suppression_kernel.xhtml">CLEdgeNonMaxSuppressionKernel</a></li>
@@ -380,10 +387,6 @@ embed_kernels: Embed OpenCL kernels in library binary(Default=0) (0|1)
     default: 0
     actual: 0
 
-scheduler: Scheduler backend(Default=cpp) (cpp|pthread|openmp)
-    default: cpp
-    actual: cpp
-
 set_soname: Set the library's soname and shlibversion (Requires SCons 2.4 or above) (yes|no)
     default: 0
     actual: False
@@ -409,7 +412,12 @@ extra_cxx_flags: Extra CXX flags to be appended to the build command
 Linux</h2>
 <h3><a class="anchor" id="S3_2_1_library"></a>
 How to build the library ?</h3>
-<p>For Linux, the library was successfully built and tested using the following Linaro GCC toolchain: gcc-linaro-arm-linux-gnueabihf-4.8-2014.02_linux and gcc-linaro-6.1.1-2016.08-x86_64_arm-linux-gnueabihf</p>
+<p>For Linux, the library was successfully built and tested using the following Linaro GCC toolchain:</p>
+<ul>
+<li>gcc-linaro-arm-linux-gnueabihf-4.9-2014.07_linux</li>
+<li>gcc-linaro-4.9-2016.02-x86_64_aarch64-linux-gnu</li>
+<li>gcc-linaro-6.3.1-2017.02-i686_aarch64-linux-gnu</li>
+</ul>
 <dl class="section note"><dt>Note</dt><dd>If you are building with opencl=1 then scons will expect to find libOpenCL.so either in the current directory or in "build" (See the section below if you need a stub OpenCL library to link against)</dd></dl>
 <p>To cross-compile the library in debug mode, with NEON only support, for Linux 32bit: </p><pre class="fragment">scons Werror=1 -j8 debug=1 neon=1 opencl=0 os=linux arch=armv7a
 </pre><p>To cross-compile the library in asserts mode, with OpenCL only support, for Linux 64bit: </p><pre class="fragment">scons Werror=1 -j8 debug=0 asserts=1 neon=0 opencl=1 embed_kernels=1 os=linux arch=arm64-v8a
@@ -423,11 +431,17 @@ scons Werror=1 -j8 debug=0 neon=1 opencl=0 os=linux arch=armv7a build=native
 <h3><a class="anchor" id="S3_2_2_examples"></a>
 How to manually build the examples ?</h3>
 <p>The examples get automatically built by scons as part of the build process of the library described above. This section just describes how you can build and link your own application against our library.</p>
-<dl class="section note"><dt>Note</dt><dd>The following command lines assume the <a class="el" href="namespacearm__compute.xhtml">arm_compute</a> binaries are present in the current directory or in the system library path.</dd></dl>
-<p>To cross compile a NEON example: </p><pre class="fragment">arm-linux-gnueabihf-g++ examples/neon_convolution.cpp test_helpers/Utils.cpp -I. -Iinclude -std=c++11 -mfpu=neon -L. -larm_compute -o neon_convolution
-</pre><p>To cross compile an OpenCL example: </p><pre class="fragment">arm-linux-gnueabihf-g++ examples/cl_convolution.cpp test_helpers/Utils.cpp -I. -Iinclude -std=c++11 -mfpu=neon -L. -larm_compute -lOpenCL -o cl_convolution
-</pre><p>To compile natively (i.e directly on an ARM device) for NEON: </p><pre class="fragment">g++ examples/neon_convolution.cpp test_helpers/Utils.cpp -I. -Iinclude -std=c++11 -mfpu=neon -larm_compute -o neon_convolution
-</pre><p>To compile natively (i.e directly on an ARM device) for OpenCL: </p><pre class="fragment">g++ examples/cl_convolution.cpp test_helpers/Utils.cpp -I. -Iinclude -std=c++11 -larm_compute -lOpenCL -o cl_convolution
+<dl class="section note"><dt>Note</dt><dd>The following command lines assume the <a class="el" href="namespacearm__compute.xhtml">arm_compute</a> and libOpenCL binaries are present in the current directory or in the system library path. If this is not the case you can specify the location of the pre-built library with the compiler option -L. When building the OpenCL example the commands below assume that the CL headers are located in the include folder where the command is executed.</dd></dl>
+<p>To cross compile a NEON example for Linux 32bit: </p><pre class="fragment">arm-linux-gnueabihf-g++ examples/neon_convolution.cpp test_helpers/Utils.cpp -I. -std=c++11 -mfpu=neon -L. -larm_compute -o neon_convolution
+</pre><p>To cross compile a NEON example for Linux 64bit: </p><pre class="fragment">aarch64-linux-gnu-g++ examples/neon_convolution.cpp test_helpers/Utils.cpp -I. -std=c++11 -L. -larm_compute -o neon_convolution
+</pre><p>(notice the only difference with the 32 bit command is that we don't need the -mfpu option and the compiler's name is different)</p>
+<p>To cross compile an OpenCL example for Linux 32bit: </p><pre class="fragment">arm-linux-gnueabihf-g++ examples/cl_convolution.cpp test_helpers/Utils.cpp -I. -Iinclude -std=c++11 -mfpu=neon -L. -larm_compute -lOpenCL -o cl_convolution
+</pre><p>To cross compile an OpenCL example for Linux 64bit: </p><pre class="fragment">aarch64-linux-gnu-g++ examples/cl_convolution.cpp test_helpers/Utils.cpp -I. -Iinclude -std=c++11 -L. -larm_compute -lOpenCL -o cl_convolution
+</pre><p>(notice the only difference with the 32 bit command is that we don't need the -mfpu option and the compiler's name is different)</p>
+<p>To compile natively (i.e directly on an ARM device) for NEON for Linux 32bit: </p><pre class="fragment">g++ examples/neon_convolution.cpp test_helpers/Utils.cpp -I. -std=c++11 -mfpu=neon -larm_compute -o neon_convolution
+</pre><p>To compile natively (i.e directly on an ARM device) for NEON for Linux 64bit: </p><pre class="fragment">g++ examples/neon_convolution.cpp test_helpers/Utils.cpp -I. -std=c++11 -larm_compute -o neon_convolution
+</pre><p>(notice the only difference with the 32 bit command is that we don't need the -mfpu option)</p>
+<p>To compile natively (i.e directly on an ARM device) for OpenCL for Linux 32bit or Linux 64bit: </p><pre class="fragment">g++ examples/cl_convolution.cpp test_helpers/Utils.cpp -I. -Iinclude -std=c++11 -larm_compute -lOpenCL -o cl_convolution
 </pre><dl class="section note"><dt>Note</dt><dd>These two commands assume libarm_compute.so is available in your library path, if not add the path to it using -L</dd></dl>
 <p>To run the built executable simply run: </p><pre class="fragment">LD_LIBRARY_PATH=build ./neon_convolution
 </pre><p>or </p><pre class="fragment">LD_LIBRARY_PATH=build ./cl_convolution
@@ -575,7 +589,7 @@ Padding</h4>
 <ul>
 <li>Accurate padding:</li>
 </ul>
-<div class="fragment"><div class="line">    PPMLoader ppm;</div><div class="line">    <a class="code" href="struct_image.xhtml">Image</a>     src, tmp, dst;</div><div class="line"></div><div class="line">    <span class="keywordflow">if</span>(argc &lt; 2)</div><div class="line">    {</div><div class="line">        <span class="comment">// Print help</span></div><div class="line">        std::cout &lt;&lt; <span class="stringliteral">&quot;Usage: ./build/neon_convolution [input_image.ppm]\n\n&quot;</span>;</div><div class="line">        std::cout &lt;&lt; <span class="stringliteral">&quot;No input_image provided, creating a dummy 640x480 image\n&quot;</span>;</div><div class="line">        <span class="comment">// Initialize just the dimensions and format of your buffers:</span></div><div class="line">        src.allocator()-&gt;init(TensorInfo(640, 480, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a6669348b484e3008dca2bfa8e85e40b5">Format::U8</a>));</div><div class="line">    }</div><div class="line">    <span class="keywordflow">else</span></div><div class="line">    {</div><div class="line">        ppm.open(argv[1]);</div><div class="line">        <span class="comment">// Initialize just the dimensions and format of your buffers:</span></div><div class="line">        ppm.init_image(src, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a6669348b484e3008dca2bfa8e85e40b5">Format::U8</a>);</div><div class="line">    }</div><div class="line"></div><div class="line">    <span class="comment">// Initialize just the dimensions and format of the temporary and destination images:</span></div><div class="line">    tmp.allocator()-&gt;init(*src.info());</div><div class="line">    dst.allocator()-&gt;init(*src.info());</div><div class="line"></div><div class="line">    NEConvolution3x3 conv3x3;</div><div class="line">    NEConvolution5x5 conv5x5;</div><div class="line"></div><div class="line">    <span class="comment">// Apply a Gaussian 3x3 filter to the source image followed by a Gaussian 5x5:</span></div><div class="line">    <span class="comment">// The function will automatically update the padding information inside input and output to match its requirements</span></div><div class="line">    conv3x3.configure(&amp;src, &amp;tmp, <a class="code" href="cl__convolution_8cpp.xhtml#a741ba5321da40184f8653e0a50ace070">gaussian3x3</a>, 0 <span class="comment">/* Let arm_compute calculate the scale */</span>, <a class="code" href="namespacearm__compute.xhtml#a15a05537a472ee742404821851529327a0db45d2a4141101bdfe48e3314cfbca3">BorderMode::UNDEFINED</a>);</div><div class="line">    conv5x5.configure(&amp;tmp, &amp;dst, <a class="code" href="cl__convolution_8cpp.xhtml#a565013cf7e49a591bacd548571951f94">gaussian5x5</a>, 0 <span class="comment">/* Let arm_compute calculate the scale */</span>, <a class="code" href="namespacearm__compute.xhtml#a15a05537a472ee742404821851529327a0db45d2a4141101bdfe48e3314cfbca3">BorderMode::UNDEFINED</a>);</div><div class="line"></div><div class="line">    <span class="comment">// Now that the padding requirements are known we can allocate the images:</span></div><div class="line">    src.allocator()-&gt;allocate();</div><div class="line">    tmp.allocator()-&gt;allocate();</div><div class="line">    dst.allocator()-&gt;allocate();</div><div class="line"></div><div class="line">    <span class="comment">// Fill the input image with the content of the PPM image if a filename was provided:</span></div><div class="line">    <span class="keywordflow">if</span>(ppm.is_open())</div><div class="line">    {</div><div class="line">        ppm.fill_image(src);</div><div class="line">    }</div><div class="line"></div><div class="line">    <span class="comment">//Execute the functions:</span></div><div class="line">    conv3x3.run();</div><div class="line">    conv5x5.run();</div><div class="line"></div><div class="line">    <span class="comment">// Save the result to file:</span></div><div class="line">    <span class="keywordflow">if</span>(ppm.is_open())</div><div class="line">    {</div><div class="line">        <span class="keyword">const</span> std::string output_filename = std::string(argv[1]) + <span class="stringliteral">&quot;_out.ppm&quot;</span>;</div><div class="line">        <a class="code" href="namespacetest__helpers.xhtml#a5036a1b77bd7223a68954b5078c6545a">save_to_ppm</a>(dst, output_filename);</div><div class="line">    }</div></div><!-- fragment --> <dl class="section note"><dt>Note</dt><dd>It's important to call allocate <b>after</b> the function is configured: if the image / tensor is already allocated then the function will shrink its execution window instead of increasing the padding. (See below for more details).</dd></dl>
+<div class="fragment"><div class="line">    PPMLoader ppm;</div><div class="line">    <a class="code" href="struct_image.xhtml">Image</a>     src, tmp, dst;</div><div class="line"></div><div class="line">    <span class="keywordflow">if</span>(argc &lt; 2)</div><div class="line">    {</div><div class="line">        <span class="comment">// Print help</span></div><div class="line">        std::cout &lt;&lt; <span class="stringliteral">&quot;Usage: ./build/neon_convolution [input_image.ppm]\n\n&quot;</span>;</div><div class="line">        std::cout &lt;&lt; <span class="stringliteral">&quot;No input_image provided, creating a dummy 640x480 image\n&quot;</span>;</div><div class="line">        <span class="comment">// Initialize just the dimensions and format of your buffers:</span></div><div class="line">        src.allocator()-&gt;init(TensorInfo(640, 480, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a6669348b484e3008dca2bfa8e85e40b5">Format::U8</a>));</div><div class="line">    }</div><div class="line">    <span class="keywordflow">else</span></div><div class="line">    {</div><div class="line">        ppm.open(argv[1]);</div><div class="line">        <span class="comment">// Initialize just the dimensions and format of your buffers:</span></div><div class="line">        ppm.init_image(src, <a class="code" href="namespacearm__compute.xhtml#ab4e88c89b3b7ea1735996cc4def22d58a6669348b484e3008dca2bfa8e85e40b5">Format::U8</a>);</div><div class="line">    }</div><div class="line"></div><div class="line">    <span class="comment">// Initialize just the dimensions and format of the temporary and destination images:</span></div><div class="line">    tmp.allocator()-&gt;init(*src.info());</div><div class="line">    dst.allocator()-&gt;init(*src.info());</div><div class="line"></div><div class="line">    NEConvolution3x3 conv3x3;</div><div class="line">    <a class="code" href="namespacearm__compute.xhtml#adbc7771d367ba8f51da1450d3602e5c0">NEConvolution5x5</a> conv5x5;</div><div class="line"></div><div class="line">    <span class="comment">// Apply a Gaussian 3x3 filter to the source image followed by a Gaussian 5x5:</span></div><div class="line">    <span class="comment">// The function will automatically update the padding information inside input and output to match its requirements</span></div><div class="line">    conv3x3.configure(&amp;src, &amp;tmp, <a class="code" href="cl__convolution_8cpp.xhtml#a741ba5321da40184f8653e0a50ace070">gaussian3x3</a>, 0 <span class="comment">/* Let arm_compute calculate the scale */</span>, <a class="code" href="namespacearm__compute.xhtml#a15a05537a472ee742404821851529327a0db45d2a4141101bdfe48e3314cfbca3">BorderMode::UNDEFINED</a>);</div><div class="line">    conv5x5.configure(&amp;tmp, &amp;dst, <a class="code" href="cl__convolution_8cpp.xhtml#a565013cf7e49a591bacd548571951f94">gaussian5x5</a>, 0 <span class="comment">/* Let arm_compute calculate the scale */</span>, <a class="code" href="namespacearm__compute.xhtml#a15a05537a472ee742404821851529327a0db45d2a4141101bdfe48e3314cfbca3">BorderMode::UNDEFINED</a>);</div><div class="line"></div><div class="line">    <span class="comment">// Now that the padding requirements are known we can allocate the images:</span></div><div class="line">    src.allocator()-&gt;allocate();</div><div class="line">    tmp.allocator()-&gt;allocate();</div><div class="line">    dst.allocator()-&gt;allocate();</div><div class="line"></div><div class="line">    <span class="comment">// Fill the input image with the content of the PPM image if a filename was provided:</span></div><div class="line">    <span class="keywordflow">if</span>(ppm.is_open())</div><div class="line">    {</div><div class="line">        ppm.fill_image(src);</div><div class="line">    }</div><div class="line"></div><div class="line">    <span class="comment">//Execute the functions:</span></div><div class="line">    conv3x3.run();</div><div class="line">    conv5x5.run();</div><div class="line"></div><div class="line">    <span class="comment">// Save the result to file:</span></div><div class="line">    <span class="keywordflow">if</span>(ppm.is_open())</div><div class="line">    {</div><div class="line">        <span class="keyword">const</span> std::string output_filename = std::string(argv[1]) + <span class="stringliteral">&quot;_out.ppm&quot;</span>;</div><div class="line">        <a class="code" href="namespacetest__helpers.xhtml#a5036a1b77bd7223a68954b5078c6545a">save_to_ppm</a>(dst, output_filename);</div><div class="line">    }</div></div><!-- fragment --> <dl class="section note"><dt>Note</dt><dd>It's important to call allocate <b>after</b> the function is configured: if the image / tensor is already allocated then the function will shrink its execution window instead of increasing the padding. (See below for more details).</dd></dl>
 <ul>
 <li>Manual padding / no padding / auto padding: You can allocate your images / tensors up front (before configuring your functions), in that case the function will use whatever padding is available and will shrink its execution window if there isn't enough padding available (Which will translates into a smaller valid region for the output. See also <a class="el" href="index.xhtml#valid_region">Valid regions</a>). If you don't want to manually set the padding but still want to allocate your objects upfront then you can use auto_padding.</li>
 </ul>
@@ -585,16 +599,6 @@ Valid regions</h4>
 <p>Some kernels (like edge detectors for example) need to read values of neighbouring pixels to calculate the value of a given pixel, it is therefore not possible to calculate the values of the pixels on the edges.</p>
 <p>Another case is: if a kernel processes 8 pixels per iteration then if the image's dimensions is not a multiple of 8 and not enough padding is available then the kernel will not be able to process the pixels near the right edge as a result these pixels will be left undefined.</p>
 <p>In order to know which pixels have been calculated, each kernel sets a valid region for each output image or tensor. See also <a class="el" href="classarm__compute_1_1_tensor_info.xhtml#ac437ef0718add962a4059fb3b3084c34">TensorInfo::valid_region()</a>, <a class="el" href="structarm__compute_1_1_valid_region.xhtml">ValidRegion</a></p>
-<dl class="section attention"><dt>Attention</dt><dd>Valid regions and accurate padding have only been introduced in the library recently therefore not all the kernels and functions have been ported to use them yet. All the non ported kernels will set the <a class="el" href="structarm__compute_1_1_valid_region.xhtml">ValidRegion</a> equal to the <a class="el" href="classarm__compute_1_1_tensor_shape.xhtml">TensorShape</a>.</dd></dl>
-<p>List of kernels which haven't been ported yet:</p>
-<ul>
-<li><a class="el" href="classarm__compute_1_1_n_e_color_convert_kernel.xhtml">NEColorConvertKernel</a></li>
-<li><a class="el" href="classarm__compute_1_1_n_e_histogram_kernel.xhtml">NEHistogramKernel</a></li>
-<li><a class="el" href="classarm__compute_1_1_n_e_histogram_border_kernel.xhtml">NEHistogramBorderKernel</a></li>
-<li><a class="el" href="classarm__compute_1_1_n_e_h_o_g_block_normalization_kernel.xhtml">NEHOGBlockNormalizationKernel</a></li>
-<li><a class="el" href="classarm__compute_1_1_n_e_h_o_g_orientation_binning_kernel.xhtml">NEHOGOrientationBinningKernel</a></li>
-<li><a class="el" href="classarm__compute_1_1_n_e_l_k_tracker_kernel.xhtml">NELKTrackerKernel</a></li>
-</ul>
 <h3><a class="anchor" id="S4_6_2_tensors"></a>
 Tensors</h3>
 <p>Tensors are multi-dimensional arrays made of up to <a class="el" href="classarm__compute_1_1_dimensions.xhtml#a1b67d5b720119d50faa286c774579ecc">Coordinates::num_max_dimensions</a> dimensions.</p>
@@ -621,7 +625,7 @@ Working with Images and Tensors using iterators</h3>
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Wed Apr 12 2017 14:26:06 for ARM Compute Library by
+    <li class="footer">Generated on Wed May 3 2017 17:20:05 for ARM Compute Library by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.11 </li>
   </ul>