diff options
Diffstat (limited to 'docs/nncc/project/high_level_design.md')
-rw-r--r-- | docs/nncc/project/high_level_design.md | 457 |
1 files changed, 457 insertions, 0 deletions
diff --git a/docs/nncc/project/high_level_design.md b/docs/nncc/project/high_level_design.md new file mode 100644 index 000000000..a15aaca4a --- /dev/null +++ b/docs/nncc/project/high_level_design.md @@ -0,0 +1,457 @@ +# SW High Level Design + +**Revision history** + +| Ver. | Date | Contents | Author | Approver | +| ---- | ---------- | ----------------- | ----------------- | ------------ | +| 0.1 | 2018.05.25 | Initial version | Vostokov Sergey | Sung-Jae Lee | +| 0.2 | 2018.06.21 | SE member review | Alexey Kondrashov | | +| 1.0 | 2018.06.22 | Final DR1 version | Vostokov Sergey | Sung-Jae Lee | + +**Terminology and Abbreviation** + +| Terminology | Description | +| ------------ | ------------------------------------------------------------- | +| OS | Operating System | +| OS API | Application interface of OS | +| HW | Hardware | +| SW | Software | +| NN | Neural Network | +| NN model | Neural network model (Instance of NN built with ML framework) | +| NN compiler | The compiler for neural network | +| ML framework | The machine learning framework | +| TF/TF Lite | Tensorflow/Tensorflow Lite ML framework | +| IR | Intermediate representation | +| CI/CI system | Continuous integration system | +| UI | The user interface | +| GUI | The graphical user interface | +| CLI | The command-line interface | + +**References** + +\[1\] Vostokov Sergey, [SW Requirements Specification](requirements_specification.md) + +## Overview + +### Scope + +The main goal of the project is to develop a compiler for neural +networks to produce executable artefact for specified SW and HW +platform. + +The development scope includes the following components: + + - Develop importer module to parse, verify and represent NN model for + further optimization and compilation + - Develop code emitters to produce executable binary for CPU and GPU + + +**2018 year goals:** + + - Support TensorFlow Lite NN model format + - Support Caffe NN model format + - Support Caffe2 NN model format (Optional) + - Support compilation of MobileNet NN + - Support compilation of Inception v3 NN + - Support ARM CPU + - Support ARM GPU (Mali) + - Support Tizen OS + - Support SmartMachine OS(Optional) + +| Product | Target Model Name | Comment | +| ------------------- | ------------------------------ | ---------------- | +| Tizen phone | Tizen TM2 | Reference device | +| Tizen device | Odroid XU4 | Reference board | +| SmartMachine target | Microvision mv8890, exynos8890 | Reference device | + +Table 1-1. Target Model + +### Design Consideration + +Deep learning software demands reliability and performance. The common +approach which comes from the history is to develop a SW framework +(machine learning framework) which would compute each step of the neural +network inference process using supported hardware. This approach is +used in many popular solutions like Google Tensorflow/Tensorflow Lite, +Caffe/2, etc. Traditionally, neural network developers build a +computation graph and then an appropriate machine learning framework +interprets it. The latest discoveries in AI field show that the +node-visitor method of execution is inefficient. As a result, a second +approach has been worked out by the industry, which is a neural network +compiler that executes code more efficiently. + +This document presents the design of the *nncc*, a neural network +compiler collection. The design should provide the easiest way to extend +the functionality of the *nncc* by adding new modules with the following +features: + + - Support neural networks produced by various machine learning + frameworks; + - Produce an artefact taking advantages of various hardware + including specialized processors like NPU; + - Apply new domain specific optimization techniques over given NN. + +### Constraints + +See constraints in SW Requirements Specification. + +<table> +<colgroup> +<col style="width: 24%" /> +<col style="width: 64%" /> +<col style="width: 10%" /> +</colgroup> +<thead> +<tr class="header"> +<th>Item</th> +<th>Assumptions, Dependencies and the Constraints</th> +<th>Reference</th> +</tr> +</thead> +<tbody> +<tr class="odd"> +<td>Tizen SW Platform</td> +<td><dl> +<dt>The following items should be provided:</dt> +<dd><ul> +<li>Tizen API</li> +<li>Tizen kernel</li> +<li>Tizen FW</li> +<li>Tizen SDK</li> +<li>Tizen naming convention</li> +</ul> +</dd> +</dl></td> +<td>- <a href="www.tizen.org" class="uri">www.tizen.org</a> <br>- <a href="wiki.tizen.org" class="uri">wiki.tizen.org</a> <br>- <a href="developer.tizen.org" class="uri">developer.tizen.org</a></td> +</tr> +<tr class="even"> +<td>SmartMachine OS Platform</td> +<td><dl> +<dt>The following items should be provided:</dt> +<dd><ul> +<li>SmartMachine API</li> +<li>SmartMachine kernel</li> +<li>SmartMachine FW</li> +<li>SmartMachine SDK</li> +<li>SmartMachine naming convention</li> +</ul> +</dd> +</dl></td> +<td>- <a href="http://suprem.sec.samsung.net/confluence/pages/viewpage.action?pageId=81833987">Platform confluence</a> <br>- <a href="https://github.sec.samsung.net/RS7-SmartMachine">Github</a> <br>- <a href="http://suprem.sec.samsung.net/confluence/display/ASEC/Adaptive+AUTOSAR">Functional Safety confluence</a></td> +</tr> +<tr class="odd"> +<td>Host OS</td> +<td>Linux-based OS (Ubuntu, Archlinux, etc)</td> +<td>- <a href="https://www.ubuntu.com/">Ubuntu site</a> <br>- <a href="https://www.archlinux.org/">Archlinux site</a></td> +</tr> +<tr class="even"> +<td>Tizen target HW</td> +<td>The reference device should be provided: Tizen TM2</td> +<td></td> +</tr> +<tr class="odd"> +<td>SmartMachine target HW</td> +<td>The reference device should be provided</td> +<td></td> +</tr> +</tbody> +</table> +Table 1-2. Assumptions, Dependecies and the Constraints</caption> + +## SW System Architecture Design + +### Overall Architecture + +The picture below presents the result of high-level analysis of the +requirements which **nncc** should satisfy. It describes the main +function **Compilation** of the compiler collection using IDEF0 +(functional modeling) notation. The full information on IDEF family of +modeling languages is available at this link on [Wikipedia: +IDEF](https://en.wikipedia.org/wiki/IDEF). + +![image](../images/nncc_idef0_a0.png) + +Figure 1. Top-Level Context Diagram of compilation function. + + +The short explanation of the **Figure 1**: + +**1. Input entities:** + + - *NN Model instance:* It is the main input of *nncc*. The compiler + takes from a user information describing a neural network which + should be compiled. In most cases, this NN is produced by a + machine learning framework and stored in one or many files. The + contents of these files constitute the essence of the neural + network. Here it is denoted as an instance of NN model. + - *Command line options:* In order to provide the most convenient + way to use the compiler, it should be configurable. Current design + presents a tool which has a Command Line Interface (CLI). Command + line options are a symbolic representation of directions + instructing the compiler how to set up a working session to get + the desired result. + +**2. Output:** + + - *Target binaries:* Everything that is produced by the compilation + operation. In general case the result may consist of one or more + files. Each of them may be one of the following: an executable, a + source code file, a log/verification/error report. For example, + when we require the compiler to compile a neural network for + execution on GPU, the output artefact may be OpenCL/C/C++ source + code, or a binary containing invocation of the procedures + delegating the calculations to GPU. + +**3. Rules and notations:** + + - *NN Model specification:* Each machine learning framework has its + own architecture design and uses its own format to + serialize/deserialize computation graphs which represent neural + networks. On a storage device, it may be saved as a file or many + files using a unique markup of binary data. To enable *nncc* to + read such data and process it, in the future it should recognize + the format of the container. Importer/parser subsystem of *nncc* + stores the full knowledge of the NN specifications and is + responsible for reading and parsing NN models (see [Import NN + model](#import-nn-model)). + - *High-Level and Low-Level Optimization techniques:* Before + deployment, a neural network developer might want to verify their + product and optimize it by size and performance. There are many + techniques for reducing the common size of neural network weights + and improving performance of the inference. NN optimization + activity can be automated by implementing each technique in the + middleend according to its specifications (see [Apply + Optimizations](#apply-optimizations)). + - *Target Runtime Environment (TRE):* In the case when the compiler + produces the binary for execution on a specific SW platform, it + should take into account the common API of this SW Platform. It + includes the full public API of a chosen OS available to the 3rd + party developers. + - *Target Instruction Set Architecture (Target ISA):* Resulting + artefact is always executed on a SW Platform using some specified + API. The user may want to generate the artefact that would use + OpenBlas or Arm Compute Library or something else (if supported by + the compiler), to perform calculations. In order to provide such + possibility, *nncc* should be aware of the API to the specified + 3rd party libraries. + - *Device specifications:* Some of the optimization techniques may + take into account the technological features of the computing + device, like the time to perform some specific calculations. Such + information is very helpful during optimization of the final code + of the compiled artefact because it may be used to select an + optimal sequence of command invocations in order to achieve the + best performance. + +**4. Mechanism:** + + - *Optimizing NN Compiler:* The implemented compiler itself. Since + *nncc* is dedicated to producing the code for the most efficient + execution, we may regard the tool as optimizing. + - *Host OS:* Since the compiler is a tool that works in some SW + Environment, the main Top-Level SW system is an Operating System. + In the SW Requirements specification it may be defined as a + Linux-like OS, for example Ubuntu, Archlinux, etc. + +### Composition of Architecture + +The compiler consists of three main parts: frontend, middleend, backend. +Together they form a Neural Network instance processing pipeline. +Moreover, there is one additional part that is in charge of the compiler +configuration. + +![image](../images/nncc_components.png) + +Figure 2. Top-Level Components of the +*nncc*. + +| Layer or Subsystem Name | Description | +| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | +| Frontend | Imports a specified Neural Network, presents it as a computation graph | +| Middleend | Provides various optimizations over the computation graph; at the end transforms it to internal IR | +| Backend | Produces the specified artefact as a result of compilation procedure using specified parameters describing the target OS, target HW, etc | +| Configuration system | Accepts command line options and configures *nncc* according to their contents | + + +The detailed decomposition of the main function **Compilation** is +presented on the diagram A1 below. + +### Interface + +Similar to any console application the *nncc* CLI accepts two types of +options: + + - Options that have values, for example, a name of the output executable + - Options that don't have values (switches) that turn various features on and off + +Additionally, options can be general and subsystem-specific. + +General options direct the process of the neural network compilation as +a whole, and also control the utility functions like the verbosity of +the messages that *nncc* outputs during the compilation process. + +Subsystem-specific options control each respective subsystem: + + - Frontend subsystem takes options that point to the NN model to + compile, which format it has, which version of the format and so + on. + - Middleend subsystem takes options that either turn on specific + optimizations for the NN model, or just point at the more desired + outcome, for example "target performance efficiency" or "target + memory efficiency". + - Backend subsystem takes options that describe the desired target + device or architecture and so on. + +For better usability, high-level options are also supported. A single +high-level option is mapped to a group of lower level options, similarly +to how it is done with conventional compiler drivers, like gcc. This way +by choosing a single Middleend option "target performance", nncc will +automatically choose a number of performance optimizations by itself. + +## SW System Operation Design + +The Figure 3 presents a more detailed composition of the main function +**Compilation**. As it was shown in previous section [Composition of +Architecture](#composition-of-architecture) it is composed of 5 +subfunctions: + + - Setup and configure each module - *Block 1* (See + [Initialization](#initialization) section) + - Import the specified neural network - *Block 2* (See [Import NN + model](#import-nn-model) section) + - Apply High-Level optimizations - *Block 3* (See [Apply + Optimizations](#apply-optimizations) section) + - Apply Low-Level optimizations - *Block 4* (See [Apply + Optimizations](#apply-optimizations) section) + - Generate the output code for specified target - *Block 5* (See + [Generate the code](#generate-the-code) section) + +![image](../images/nncc_idef0_a1.png) + +Figure 3. Decomposition of top-Level function **Compilation**. + +### Initialization + +At this stage the initialization of all submodules of the *nncc* +happens. This procedure starts from command line option processing till +selection of all required and correctly configured modules. At the +parsing stage the configuration system checks its own consistency. If +command line option set is not enought to establish a valid +configuration the environment variables will be used. Also, almost all +configuration options can be read from config file if it is specified in +command line. + +### Import NN model + +The major function of the *nncc* frontend is to import specified NN +model. It means that frontend should recognize the format of given NN +model, parse all internal structures (load computation graph using +framework specific IR: NN topology, NN ops, weights), verify their +correctness and convert to Model IR. + +### Apply Optimizations + +There are two levels of neural network optimizations in *nncc*. + +First one is High-Level Optimizations, they are applied to the Model IR, +which is output by the NN Import subsystem. + +#### High-Level Optimizations + +High-Level optimizations can be divided into two groups: + + - optimizations aimed at reducing the size of the resulting model - + *size optimizations* + - optimizations aimed at reducing the inference time of the model - + *performance optimizations* + +These two groups are not mutually exclusive. Some optimization +techniques positively affect both size and performance, while some of +them might reduce the size of the model at some performance cost. + +High-Level Optimizations in this sense are purely +neural-network-specific, as they attempt to improve the model by +manipulating the computation graph and the weights. For example, some +techniques search for unused parts of the computation graph and remove +them, or they search for the parts of the graph that can be merged +together and thus gain some performance. Other techniques manipulate the +neural network weights - either reduce their amount or modify their +values in a way that allows for the reduced storage consumption. + +Currently, High-Level Optimizations are out of scope of the project. + +#### Low-Level Optimization + +The Low-Level Optimizations are applied by the compiler closer to the +end of the whole compilation process, before the executable generation. +The input for this stage of *nncc* is the Coarse-Grained IR, which is +output but High-Level Optimization subsystem. + +### Generate the code + +Present architecture allows for several backend solutions, depending on +target specified. Those solutions can be divided into 3 types: + + - *Interpretation.* At every step inference can be carried out by + interpreting IR produced after that step. + - *Soft backend.* Resulting program can be generated as source code + in high-level programming language (e.g., C/C++) that does not + depend on libraries outside of itself, with the exception of + system libraries. + - *Hardware (Binary) backend.* This type refers to generating binary + code that can be executed on target device. NN compiler can + generate code that is either executed solely on CPU, or takes + advantage of the GPU when possible if corresponding target was + specified. + +Third-party libraries incorporation can be done either in form of source +code or by compiling a binary artefact. + +## Appendix 1. Traceability Matrix + +The following table shows mapping between SW Requirements Specification +and SW High-Level Design +Document. + +| Requirement | Description | Section | +| ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- | +| RF-1 (Frontend: Tensorflow Lite) | The compiler should support import of NN model in Tensorflow Lite format (parsing & verification of data scheme v0-v3, 50 NN ops) | [Import NN model](#import-nn-model) | +| RF-2 (Frontend: Caffe) | The compiler should support import of NN model in Caffe format (parsing & verification) | [Import NN model](#import-nn-model) | +| RF-3 (Frontend: Caffe2 (Optional)) | The compiler should support import of NN model in Caffe2 format (parsing & verification) | [Import NN model](#import-nn-model) | +| RF-4 (Frontend: lossless import) | The frontend should use the lossless approach while it is converting any NN model to IR | [Import NN model](#import-nn-model) | +| RF-5 (Frontend: Inception\_v3) | The frontend should successful import the Inception V3 NN model | [Import NN model](#import-nn-model) | +| RF-6 (Frontend: MobileNet) | The frontend should successful import the MobileNet NN model | [Import NN model](#import-nn-model) | +| RF-7 (Backend: ARM CPU) | The compiler should produce executable for ARM CPU | [Generate the code](#generate-the-code) | +| RF-8 (Backend: ARM GPU) | The compiler should produce the binary that takes advantages of GPU when it was specified before compilation | [Generate the code](#generate-the-code) | +| RF-9 (Backend: Artefact type) | The compiler should produce executable as a shared library or as a static library | [Generate the code](#generate-the-code) | +| RF-10 (Backend: Inception\_v3) | The compiler should produce the valid compiled artefact for Inception v3 NN model | [Generate the code](#generate-the-code) | +| RF-11 (Backend: MobileNet) | The compiler should produce the valid compiled artefact for MobileNet NN model | [Generate the code](#generate-the-code) | +| RF-12 (Config: command line) | The compiler should get configuration parameters from command line | [Initialization](#initialization) | +| RF-13 (Config: config file (Optional)) | The compiler should get configuration parameters from config file | [Initialization](#initialization) | +| RF-14 (Config: environment variable (Optional)) | The compiler should get configuration parameters from environment variables | [Initialization](#initialization) | +| RF-15 (Artefact: result) | The artefact should provide comparable result to the original NN model for the same input data | [Generate the code](#generate-the-code) | +| RF-16 (Artefact: input verifications) | The artefact should verify any input data and check consistency | [Generate the code](#generate-the-code) | +| RF-17 (Artefact: GPU) | The artefact should take advantage of the GPU for GPU-enabled operations | [Generate the code](#generate-the-code) | +| RF-18 (Artefact: CPU) | The artefact should take advantage of CPU if it was specified | [Generate the code](#generate-the-code) | + +**Design Module of S/W Architecture** + +| Requirement | Import NN model | Generate the code | Initialization | +| ----------------------------------------------- | --------------- | ----------------- | -------------- | +| RF-1 (Frontend: Tensorflow Lite) | O | | | +| RF-2 (Frontend: Caffe) | O | | | +| RF-3 (Frontend: Caffe2 (Optional)) | O | | | +| RF-4 (Frontend: lossless import) | O | | | +| RF-5 (Frontend: Inception\_v3) | O | | | +| RF-6 (Frontend: MobileNet) | O | | | +| RF-7 (Backend: ARM CPU) | | O | | +| RF-8 (Backend: ARM GPU) | | O | | +| RF-9 (Backend: Artefact type) | | O | | +| RF-10 (Backend: Inception\_v3) | | O | | +| RF-11 (Backend: MobileNet) | | O | | +| RF-12 (Config: command line) | | | O | +| RF-13 (Config: config file (Optional)) | | | O | +| RF-14 (Config: environment variable (Optional)) | | | O | +| RF-15 (Artefact: result) | | O | | +| RF-16 (Artefact: input verifications) | | O | | +| RF-17 (Artefact: GPU) | | O | | +| RF-18 (Artefact: CPU) | | O | | |