Compute Library  18.05
AssemblyKernelGlue< TypeInput, TypeOutput > Class Template Referencefinal

Assembly kernel glue. More...

#include <AssemblyHelper.h>

Collaboration diagram for AssemblyKernelGlue< TypeInput, TypeOutput >:
[legend]

Public Types

using TypeOperator = TypeInput
 Operator type. More...
 
using TypeResult = TypeOutput
 Result type. More...
 
using AssemblyGemm = arm_gemm::GemmCommon< TypeInput, TypeOutput >
 Assembly Gemm. More...
 

Public Member Functions

 AssemblyKernelGlue ()
 Default constructor. More...
 
const AssemblyKernelGlue< TypeInput, TypeOutput > & operator= (const AssemblyKernelGlue< TypeInput, TypeOutput > &)=delete
 Prevent instances of this class from being copy constructed. More...
 
 AssemblyKernelGlue (const AssemblyKernelGlue< TypeInput, TypeOutput > &)=delete
 Prevent instances of this class from being copied. More...
 
void run ()
 Configures the arrays pointers and strides in the assembly kernel and executes the assembly kernel. More...
 

Data Fields

std::unique_ptr< AssemblyGemm_gemm_kernel_asm
 Assembly Gemm kernel. More...
 
std::unique_ptr< INEKernel_optimised_kernel
 Optimised NEON kernel. More...
 
const ITensor_a
 Input A. More...
 
const ITensor_b
 Input B. More...
 
ITensor_d
 Output. More...
 
ITensor_pretranspose
 Pre-transpose tensor. More...
 

Detailed Description

template<typename TypeInput, typename TypeOutput>
class arm_compute::AssemblyKernelGlue< TypeInput, TypeOutput >

Assembly kernel glue.

Definition at line 45 of file AssemblyHelper.h.

Member Typedef Documentation

using AssemblyGemm = arm_gemm::GemmCommon<TypeInput, TypeOutput>

Assembly Gemm.

Definition at line 58 of file AssemblyHelper.h.

using TypeOperator = TypeInput

Operator type.

Definition at line 49 of file AssemblyHelper.h.

using TypeResult = TypeOutput

Result type.

Definition at line 51 of file AssemblyHelper.h.

Constructor & Destructor Documentation

AssemblyKernelGlue ( )
inline

Default constructor.

Definition at line 53 of file AssemblyHelper.h.

54  : _gemm_kernel_asm(nullptr), _optimised_kernel(nullptr), _a(nullptr), _b(nullptr), _d(nullptr), _pretranspose(nullptr)
55  {
56  }
ITensor * _pretranspose
Pre-transpose tensor.
const ITensor * _a
Input A.
const ITensor * _b
Input B.
std::unique_ptr< INEKernel > _optimised_kernel
Optimised NEON kernel.
std::unique_ptr< AssemblyGemm > _gemm_kernel_asm
Assembly Gemm kernel.
AssemblyKernelGlue ( const AssemblyKernelGlue< TypeInput, TypeOutput > &  )
delete

Prevent instances of this class from being copied.

Member Function Documentation

const AssemblyKernelGlue<TypeInput, TypeOutput>& operator= ( const AssemblyKernelGlue< TypeInput, TypeOutput > &  )
delete

Prevent instances of this class from being copy constructed.

void run ( )
inline

Configures the arrays pointers and strides in the assembly kernel and executes the assembly kernel.

The call to set_arrays is needed to deal with the input sizes containing batches (dims > 2)

Definition at line 81 of file AssemblyHelper.h.

82  {
83  const int lda = _a->info()->strides_in_bytes().y() / sizeof(TypeInput);
84  const int ldb = _b->info()->strides_in_bytes().y() / sizeof(TypeInput);
85  const int ldd = _d->info()->strides_in_bytes().y() / sizeof(TypeOutput);
86 
87  // In the case of NHWC we want to interpret the output shape as 3D. Thus, the batch stride for A is
88  // the relevant multiple of the row stride.
89  const bool is_nhwc = _a->info()->data_layout() == DataLayout::NHWC;
90  const int stride_in_bytes_a = is_nhwc ? _a->info()->strides_in_bytes().y() * _d->info()->dimension(1) : _a->info()->strides_in_bytes().z();
91 
92  const int batch_stride_a = stride_in_bytes_a / sizeof(TypeInput);
93  const int batch_stride_d = _d->info()->strides_in_bytes().z() / sizeof(TypeOutput);
94 
95  const int multi_stride_a = _a->info()->strides_in_bytes()[3] / sizeof(TypeInput);
96  const int multi_stride_b = _b->info()->strides_in_bytes().z() / sizeof(TypeInput);
97  const int multi_stride_d = _d->info()->strides_in_bytes()[3] / sizeof(TypeOutput);
98 
99  const auto in0_ptr = reinterpret_cast<const TypeInput *>(_a->buffer());
100  const auto in1_ptr = reinterpret_cast<const TypeInput *>(_b->buffer());
101  auto out_ptr = reinterpret_cast<TypeOutput *>(_d->buffer());
102 
103  _gemm_kernel_asm->set_arrays(in0_ptr, lda, batch_stride_a, multi_stride_a, in1_ptr, ldb, multi_stride_b, out_ptr, ldd, batch_stride_d, multi_stride_d);
104  if(_gemm_kernel_asm->B_pretranspose_required())
105  {
106  // Forcing 128-byte alignment (required by 32-bit kernels)
107  const unsigned int alignment = 128;
108  void *raw_ptr = reinterpret_cast<void *>(_pretranspose->buffer());
109  size_t space = _pretranspose->info()->total_size();
110  void *aligned_ptr = support::cpp11::align(alignment, _gemm_kernel_asm->get_B_pretransposed_array_size(), raw_ptr, space);
111  ARM_COMPUTE_ERROR_ON(_pretranspose == nullptr || _pretranspose->buffer() == nullptr);
112  _gemm_kernel_asm->pretranspose_B_array(aligned_ptr, in1_ptr, ldb, multi_stride_b);
113  _b->mark_as_unused();
114  }
115 
117  }
ITensor * _pretranspose
Pre-transpose tensor.
T z() const
Alias to access the size of the third dimension.
Definition: Dimensions.h:91
virtual size_t dimension(size_t index) const =0
Return the size of the requested dimension.
const ITensor * _a
Input A.
const ITensor * _b
Input B.
#define ARM_COMPUTE_ERROR_ON(cond)
If the condition is true then an error message is printed and an exception thrown.
Definition: Error.h:328
static constexpr size_t DimX
Alias for dimension 0 also known as X dimension.
Definition: Window.h:43
virtual uint8_t * buffer() const =0
Interface to be implemented by the child class to return a pointer to CPU memory. ...
virtual ITensorInfo * info() const =0
Interface to be implemented by the child class to return the tensor&#39;s metadata.
void * align(std::size_t alignment, std::size_t size, void *&ptr, std::size_t &space)
void mark_as_unused() const
Marks a tensor as unused.
virtual size_t total_size() const =0
Returns the total size of the tensor in bytes.
T y() const
Alias to access the size of the second dimension.
Definition: Dimensions.h:86
Num samples, height, width, channels.
std::unique_ptr< INEKernel > _optimised_kernel
Optimised NEON kernel.
virtual const Strides & strides_in_bytes() const =0
The strides in bytes for accessing each dimension of the tensor.
std::unique_ptr< AssemblyGemm > _gemm_kernel_asm
Assembly Gemm kernel.
static IScheduler & get()
Access the scheduler singleton.
virtual void schedule(ICPPKernel *kernel, unsigned int split_dimension)=0
Runs the kernel in the same thread as the caller synchronously.
virtual DataLayout data_layout() const =0
Get the data layout of the tensor.

Field Documentation

const ITensor* _a

Input A.

Definition at line 70 of file AssemblyHelper.h.

const ITensor* _b

Input B.

Definition at line 72 of file AssemblyHelper.h.

ITensor* _d

Output.

Definition at line 74 of file AssemblyHelper.h.

std::unique_ptr<AssemblyGemm> _gemm_kernel_asm

Assembly Gemm kernel.

Definition at line 66 of file AssemblyHelper.h.

std::unique_ptr<INEKernel> _optimised_kernel

Optimised NEON kernel.

Definition at line 68 of file AssemblyHelper.h.

ITensor* _pretranspose

Pre-transpose tensor.

Definition at line 76 of file AssemblyHelper.h.


The documentation for this class was generated from the following file: