summaryrefslogtreecommitdiff
path: root/docs/nncc/project/high_level_design.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/nncc/project/high_level_design.md')
-rw-r--r--docs/nncc/project/high_level_design.md457
1 files changed, 0 insertions, 457 deletions
diff --git a/docs/nncc/project/high_level_design.md b/docs/nncc/project/high_level_design.md
deleted file mode 100644
index a15aaca4a..000000000
--- a/docs/nncc/project/high_level_design.md
+++ /dev/null
@@ -1,457 +0,0 @@
-# SW High Level Design
-
-**Revision history**
-
-| Ver. | Date | Contents | Author | Approver |
-| ---- | ---------- | ----------------- | ----------------- | ------------ |
-| 0.1 | 2018.05.25 | Initial version | Vostokov Sergey | Sung-Jae Lee |
-| 0.2 | 2018.06.21 | SE member review | Alexey Kondrashov | |
-| 1.0 | 2018.06.22 | Final DR1 version | Vostokov Sergey | Sung-Jae Lee |
-
-**Terminology and Abbreviation**
-
-| Terminology | Description |
-| ------------ | ------------------------------------------------------------- |
-| OS | Operating System |
-| OS API | Application interface of OS |
-| HW | Hardware |
-| SW | Software |
-| NN | Neural Network |
-| NN model | Neural network model (Instance of NN built with ML framework) |
-| NN compiler | The compiler for neural network |
-| ML framework | The machine learning framework |
-| TF/TF Lite | Tensorflow/Tensorflow Lite ML framework |
-| IR | Intermediate representation |
-| CI/CI system | Continuous integration system |
-| UI | The user interface |
-| GUI | The graphical user interface |
-| CLI | The command-line interface |
-
-**References**
-
-\[1\] Vostokov Sergey, [SW Requirements Specification](requirements_specification.md)
-
-## Overview
-
-### Scope
-
-The main goal of the project is to develop a compiler for neural
-networks to produce executable artefact for specified SW and HW
-platform.
-
-The development scope includes the following components:
-
- - Develop importer module to parse, verify and represent NN model for
- further optimization and compilation
- - Develop code emitters to produce executable binary for CPU and GPU
-
-
-**2018 year goals:**
-
- - Support TensorFlow Lite NN model format
- - Support Caffe NN model format
- - Support Caffe2 NN model format (Optional)
- - Support compilation of MobileNet NN
- - Support compilation of Inception v3 NN
- - Support ARM CPU
- - Support ARM GPU (Mali)
- - Support Tizen OS
- - Support SmartMachine OS(Optional)
-
-| Product | Target Model Name | Comment |
-| ------------------- | ------------------------------ | ---------------- |
-| Tizen phone | Tizen TM2 | Reference device |
-| Tizen device | Odroid XU4 | Reference board |
-| SmartMachine target | Microvision mv8890, exynos8890 | Reference device |
-
-Table 1-1. Target Model
-
-### Design Consideration
-
-Deep learning software demands reliability and performance. The common
-approach which comes from the history is to develop a SW framework
-(machine learning framework) which would compute each step of the neural
-network inference process using supported hardware. This approach is
-used in many popular solutions like Google Tensorflow/Tensorflow Lite,
-Caffe/2, etc. Traditionally, neural network developers build a
-computation graph and then an appropriate machine learning framework
-interprets it. The latest discoveries in AI field show that the
-node-visitor method of execution is inefficient. As a result, a second
-approach has been worked out by the industry, which is a neural network
-compiler that executes code more efficiently.
-
-This document presents the design of the *nncc*, a neural network
-compiler collection. The design should provide the easiest way to extend
-the functionality of the *nncc* by adding new modules with the following
-features:
-
- - Support neural networks produced by various machine learning
- frameworks;
- - Produce an artefact taking advantages of various hardware
- including specialized processors like NPU;
- - Apply new domain specific optimization techniques over given NN.
-
-### Constraints
-
-See constraints in SW Requirements Specification.
-
-<table>
-<colgroup>
-<col style="width: 24%" />
-<col style="width: 64%" />
-<col style="width: 10%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th>Item</th>
-<th>Assumptions, Dependencies and the Constraints</th>
-<th>Reference</th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>Tizen SW Platform</td>
-<td><dl>
-<dt>The following items should be provided:</dt>
-<dd><ul>
-<li>Tizen API</li>
-<li>Tizen kernel</li>
-<li>Tizen FW</li>
-<li>Tizen SDK</li>
-<li>Tizen naming convention</li>
-</ul>
-</dd>
-</dl></td>
-<td>- <a href="www.tizen.org" class="uri">www.tizen.org</a> <br>- <a href="wiki.tizen.org" class="uri">wiki.tizen.org</a> <br>- <a href="developer.tizen.org" class="uri">developer.tizen.org</a></td>
-</tr>
-<tr class="even">
-<td>SmartMachine OS Platform</td>
-<td><dl>
-<dt>The following items should be provided:</dt>
-<dd><ul>
-<li>SmartMachine API</li>
-<li>SmartMachine kernel</li>
-<li>SmartMachine FW</li>
-<li>SmartMachine SDK</li>
-<li>SmartMachine naming convention</li>
-</ul>
-</dd>
-</dl></td>
-<td>- <a href="http://suprem.sec.samsung.net/confluence/pages/viewpage.action?pageId=81833987">Platform confluence</a> <br>- <a href="https://github.sec.samsung.net/RS7-SmartMachine">Github</a> <br>- <a href="http://suprem.sec.samsung.net/confluence/display/ASEC/Adaptive+AUTOSAR">Functional Safety confluence</a></td>
-</tr>
-<tr class="odd">
-<td>Host OS</td>
-<td>Linux-based OS (Ubuntu, Archlinux, etc)</td>
-<td>- <a href="https://www.ubuntu.com/">Ubuntu site</a> <br>- <a href="https://www.archlinux.org/">Archlinux site</a></td>
-</tr>
-<tr class="even">
-<td>Tizen target HW</td>
-<td>The reference device should be provided: Tizen TM2</td>
-<td></td>
-</tr>
-<tr class="odd">
-<td>SmartMachine target HW</td>
-<td>The reference device should be provided</td>
-<td></td>
-</tr>
-</tbody>
-</table>
-Table 1-2. Assumptions, Dependecies and the Constraints</caption>
-
-## SW System Architecture Design
-
-### Overall Architecture
-
-The picture below presents the result of high-level analysis of the
-requirements which **nncc** should satisfy. It describes the main
-function **Compilation** of the compiler collection using IDEF0
-(functional modeling) notation. The full information on IDEF family of
-modeling languages is available at this link on [Wikipedia:
-IDEF](https://en.wikipedia.org/wiki/IDEF).
-
-![image](../images/nncc_idef0_a0.png)
-
-Figure 1. Top-Level Context Diagram of compilation function.
-
-
-The short explanation of the **Figure 1**:
-
-**1. Input entities:**
-
- - *NN Model instance:* It is the main input of *nncc*. The compiler
- takes from a user information describing a neural network which
- should be compiled. In most cases, this NN is produced by a
- machine learning framework and stored in one or many files. The
- contents of these files constitute the essence of the neural
- network. Here it is denoted as an instance of NN model.
- - *Command line options:* In order to provide the most convenient
- way to use the compiler, it should be configurable. Current design
- presents a tool which has a Command Line Interface (CLI). Command
- line options are a symbolic representation of directions
- instructing the compiler how to set up a working session to get
- the desired result.
-
-**2. Output:**
-
- - *Target binaries:* Everything that is produced by the compilation
- operation. In general case the result may consist of one or more
- files. Each of them may be one of the following: an executable, a
- source code file, a log/verification/error report. For example,
- when we require the compiler to compile a neural network for
- execution on GPU, the output artefact may be OpenCL/C/C++ source
- code, or a binary containing invocation of the procedures
- delegating the calculations to GPU.
-
-**3. Rules and notations:**
-
- - *NN Model specification:* Each machine learning framework has its
- own architecture design and uses its own format to
- serialize/deserialize computation graphs which represent neural
- networks. On a storage device, it may be saved as a file or many
- files using a unique markup of binary data. To enable *nncc* to
- read such data and process it, in the future it should recognize
- the format of the container. Importer/parser subsystem of *nncc*
- stores the full knowledge of the NN specifications and is
- responsible for reading and parsing NN models (see [Import NN
- model](#import-nn-model)).
- - *High-Level and Low-Level Optimization techniques:* Before
- deployment, a neural network developer might want to verify their
- product and optimize it by size and performance. There are many
- techniques for reducing the common size of neural network weights
- and improving performance of the inference. NN optimization
- activity can be automated by implementing each technique in the
- middleend according to its specifications (see [Apply
- Optimizations](#apply-optimizations)).
- - *Target Runtime Environment (TRE):* In the case when the compiler
- produces the binary for execution on a specific SW platform, it
- should take into account the common API of this SW Platform. It
- includes the full public API of a chosen OS available to the 3rd
- party developers.
- - *Target Instruction Set Architecture (Target ISA):* Resulting
- artefact is always executed on a SW Platform using some specified
- API. The user may want to generate the artefact that would use
- OpenBlas or Arm Compute Library or something else (if supported by
- the compiler), to perform calculations. In order to provide such
- possibility, *nncc* should be aware of the API to the specified
- 3rd party libraries.
- - *Device specifications:* Some of the optimization techniques may
- take into account the technological features of the computing
- device, like the time to perform some specific calculations. Such
- information is very helpful during optimization of the final code
- of the compiled artefact because it may be used to select an
- optimal sequence of command invocations in order to achieve the
- best performance.
-
-**4. Mechanism:**
-
- - *Optimizing NN Compiler:* The implemented compiler itself. Since
- *nncc* is dedicated to producing the code for the most efficient
- execution, we may regard the tool as optimizing.
- - *Host OS:* Since the compiler is a tool that works in some SW
- Environment, the main Top-Level SW system is an Operating System.
- In the SW Requirements specification it may be defined as a
- Linux-like OS, for example Ubuntu, Archlinux, etc.
-
-### Composition of Architecture
-
-The compiler consists of three main parts: frontend, middleend, backend.
-Together they form a Neural Network instance processing pipeline.
-Moreover, there is one additional part that is in charge of the compiler
-configuration.
-
-![image](../images/nncc_components.png)
-
-Figure 2. Top-Level Components of the
-*nncc*.
-
-| Layer or Subsystem Name | Description |
-| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
-| Frontend | Imports a specified Neural Network, presents it as a computation graph |
-| Middleend | Provides various optimizations over the computation graph; at the end transforms it to internal IR |
-| Backend | Produces the specified artefact as a result of compilation procedure using specified parameters describing the target OS, target HW, etc |
-| Configuration system | Accepts command line options and configures *nncc* according to their contents |
-
-
-The detailed decomposition of the main function **Compilation** is
-presented on the diagram A1 below.
-
-### Interface
-
-Similar to any console application the *nncc* CLI accepts two types of
-options:
-
- - Options that have values, for example, a name of the output executable
- - Options that don't have values (switches) that turn various features on and off
-
-Additionally, options can be general and subsystem-specific.
-
-General options direct the process of the neural network compilation as
-a whole, and also control the utility functions like the verbosity of
-the messages that *nncc* outputs during the compilation process.
-
-Subsystem-specific options control each respective subsystem:
-
- - Frontend subsystem takes options that point to the NN model to
- compile, which format it has, which version of the format and so
- on.
- - Middleend subsystem takes options that either turn on specific
- optimizations for the NN model, or just point at the more desired
- outcome, for example "target performance efficiency" or "target
- memory efficiency".
- - Backend subsystem takes options that describe the desired target
- device or architecture and so on.
-
-For better usability, high-level options are also supported. A single
-high-level option is mapped to a group of lower level options, similarly
-to how it is done with conventional compiler drivers, like gcc. This way
-by choosing a single Middleend option "target performance", nncc will
-automatically choose a number of performance optimizations by itself.
-
-## SW System Operation Design
-
-The Figure 3 presents a more detailed composition of the main function
-**Compilation**. As it was shown in previous section [Composition of
-Architecture](#composition-of-architecture) it is composed of 5
-subfunctions:
-
- - Setup and configure each module - *Block 1* (See
- [Initialization](#initialization) section)
- - Import the specified neural network - *Block 2* (See [Import NN
- model](#import-nn-model) section)
- - Apply High-Level optimizations - *Block 3* (See [Apply
- Optimizations](#apply-optimizations) section)
- - Apply Low-Level optimizations - *Block 4* (See [Apply
- Optimizations](#apply-optimizations) section)
- - Generate the output code for specified target - *Block 5* (See
- [Generate the code](#generate-the-code) section)
-
-![image](../images/nncc_idef0_a1.png)
-
-Figure 3. Decomposition of top-Level function **Compilation**.
-
-### Initialization
-
-At this stage the initialization of all submodules of the *nncc*
-happens. This procedure starts from command line option processing till
-selection of all required and correctly configured modules. At the
-parsing stage the configuration system checks its own consistency. If
-command line option set is not enought to establish a valid
-configuration the environment variables will be used. Also, almost all
-configuration options can be read from config file if it is specified in
-command line.
-
-### Import NN model
-
-The major function of the *nncc* frontend is to import specified NN
-model. It means that frontend should recognize the format of given NN
-model, parse all internal structures (load computation graph using
-framework specific IR: NN topology, NN ops, weights), verify their
-correctness and convert to Model IR.
-
-### Apply Optimizations
-
-There are two levels of neural network optimizations in *nncc*.
-
-First one is High-Level Optimizations, they are applied to the Model IR,
-which is output by the NN Import subsystem.
-
-#### High-Level Optimizations
-
-High-Level optimizations can be divided into two groups:
-
- - optimizations aimed at reducing the size of the resulting model -
- *size optimizations*
- - optimizations aimed at reducing the inference time of the model -
- *performance optimizations*
-
-These two groups are not mutually exclusive. Some optimization
-techniques positively affect both size and performance, while some of
-them might reduce the size of the model at some performance cost.
-
-High-Level Optimizations in this sense are purely
-neural-network-specific, as they attempt to improve the model by
-manipulating the computation graph and the weights. For example, some
-techniques search for unused parts of the computation graph and remove
-them, or they search for the parts of the graph that can be merged
-together and thus gain some performance. Other techniques manipulate the
-neural network weights - either reduce their amount or modify their
-values in a way that allows for the reduced storage consumption.
-
-Currently, High-Level Optimizations are out of scope of the project.
-
-#### Low-Level Optimization
-
-The Low-Level Optimizations are applied by the compiler closer to the
-end of the whole compilation process, before the executable generation.
-The input for this stage of *nncc* is the Coarse-Grained IR, which is
-output but High-Level Optimization subsystem.
-
-### Generate the code
-
-Present architecture allows for several backend solutions, depending on
-target specified. Those solutions can be divided into 3 types:
-
- - *Interpretation.* At every step inference can be carried out by
- interpreting IR produced after that step.
- - *Soft backend.* Resulting program can be generated as source code
- in high-level programming language (e.g., C/C++) that does not
- depend on libraries outside of itself, with the exception of
- system libraries.
- - *Hardware (Binary) backend.* This type refers to generating binary
- code that can be executed on target device. NN compiler can
- generate code that is either executed solely on CPU, or takes
- advantage of the GPU when possible if corresponding target was
- specified.
-
-Third-party libraries incorporation can be done either in form of source
-code or by compiling a binary artefact.
-
-## Appendix 1. Traceability Matrix
-
-The following table shows mapping between SW Requirements Specification
-and SW High-Level Design
-Document.
-
-| Requirement | Description | Section |
-| ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
-| RF-1 (Frontend: Tensorflow Lite) | The compiler should support import of NN model in Tensorflow Lite format (parsing & verification of data scheme v0-v3, 50 NN ops) | [Import NN model](#import-nn-model) |
-| RF-2 (Frontend: Caffe) | The compiler should support import of NN model in Caffe format (parsing & verification) | [Import NN model](#import-nn-model) |
-| RF-3 (Frontend: Caffe2 (Optional)) | The compiler should support import of NN model in Caffe2 format (parsing & verification) | [Import NN model](#import-nn-model) |
-| RF-4 (Frontend: lossless import) | The frontend should use the lossless approach while it is converting any NN model to IR | [Import NN model](#import-nn-model) |
-| RF-5 (Frontend: Inception\_v3) | The frontend should successful import the Inception V3 NN model | [Import NN model](#import-nn-model) |
-| RF-6 (Frontend: MobileNet) | The frontend should successful import the MobileNet NN model | [Import NN model](#import-nn-model) |
-| RF-7 (Backend: ARM CPU) | The compiler should produce executable for ARM CPU | [Generate the code](#generate-the-code) |
-| RF-8 (Backend: ARM GPU) | The compiler should produce the binary that takes advantages of GPU when it was specified before compilation | [Generate the code](#generate-the-code) |
-| RF-9 (Backend: Artefact type) | The compiler should produce executable as a shared library or as a static library | [Generate the code](#generate-the-code) |
-| RF-10 (Backend: Inception\_v3) | The compiler should produce the valid compiled artefact for Inception v3 NN model | [Generate the code](#generate-the-code) |
-| RF-11 (Backend: MobileNet) | The compiler should produce the valid compiled artefact for MobileNet NN model | [Generate the code](#generate-the-code) |
-| RF-12 (Config: command line) | The compiler should get configuration parameters from command line | [Initialization](#initialization) |
-| RF-13 (Config: config file (Optional)) | The compiler should get configuration parameters from config file | [Initialization](#initialization) |
-| RF-14 (Config: environment variable (Optional)) | The compiler should get configuration parameters from environment variables | [Initialization](#initialization) |
-| RF-15 (Artefact: result) | The artefact should provide comparable result to the original NN model for the same input data | [Generate the code](#generate-the-code) |
-| RF-16 (Artefact: input verifications) | The artefact should verify any input data and check consistency | [Generate the code](#generate-the-code) |
-| RF-17 (Artefact: GPU) | The artefact should take advantage of the GPU for GPU-enabled operations | [Generate the code](#generate-the-code) |
-| RF-18 (Artefact: CPU) | The artefact should take advantage of CPU if it was specified | [Generate the code](#generate-the-code) |
-
-**Design Module of S/W Architecture**
-
-| Requirement | Import NN model | Generate the code | Initialization |
-| ----------------------------------------------- | --------------- | ----------------- | -------------- |
-| RF-1 (Frontend: Tensorflow Lite) | O | | |
-| RF-2 (Frontend: Caffe) | O | | |
-| RF-3 (Frontend: Caffe2 (Optional)) | O | | |
-| RF-4 (Frontend: lossless import) | O | | |
-| RF-5 (Frontend: Inception\_v3) | O | | |
-| RF-6 (Frontend: MobileNet) | O | | |
-| RF-7 (Backend: ARM CPU) | | O | |
-| RF-8 (Backend: ARM GPU) | | O | |
-| RF-9 (Backend: Artefact type) | | O | |
-| RF-10 (Backend: Inception\_v3) | | O | |
-| RF-11 (Backend: MobileNet) | | O | |
-| RF-12 (Config: command line) | | | O |
-| RF-13 (Config: config file (Optional)) | | | O |
-| RF-14 (Config: environment variable (Optional)) | | | O |
-| RF-15 (Artefact: result) | | O | |
-| RF-16 (Artefact: input verifications) | | O | |
-| RF-17 (Artefact: GPU) | | O | |
-| RF-18 (Artefact: CPU) | | O | |