docs/nnfw/project/2019_requirement_specification.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131

# Software Requirement Specification

## Background
Artificial intelligence (AI) techniques are getting popular and utilized in various products and
services.  While the cloud-based AI techniques have been used to perform compute/memory intensive
inferences because of the powerful servers on cloud, on-device AI technologies are recently drawing
attention from the mobile industry for response time reduction, privacy protection, and
connection-less AI service.  Big mobile players, such as Google, Apple, and Huawei, are investing
their research effort on the on-device AI technologies and already announced hardware and software
on-device AI solutions.  Samsung is not leading this trend currently, but since on-device AI area is
just started and still in the initial state, there are still opportunities and possibilities to
reduce the gap between pioneer companies and Samsung.  We believe on-device AI will become a key
differentiator for mobile phone, TV, and other home appliances, and thus developing on-device AI
software stack is of paramount importance in order to take leadership in the on-device AI
technology.

Although the vision of on-device AI is promising, enabling on-device AI involves unique technical
challenges compared to traditional cloud-based approach.  This is because on-device AI tries to
conduct inference tasks solely on device without connecting to cloud resources.  Specifically,
hardware resources on device, such as processor performance, memory capacity, and power budget, are
very scarce and limit the compute capability, which is typically required to execute complicated
neural network (NN) models.  For example, in one product requirement, a mobile device should consume
less than 1.2W and could use at most 2W only for 10 minutes due to thermal issue.  Next, on-device
AI software stack needs to support diverse device environments, since embedded platforms may consist
of heterogeneous compute devices, such as CPU, GPU, DSP, or neural processing unit (NPU), and use
different OS platforms, such as Tizen, Android, or various embedded Linux.

To tackle the challenges above and to have the leadership on on-device AI technology, this project
aims at developing a neural network inference framework specialized and optimized for on-device AI.


## Product Context

This project _nnfw_ aims at providing a high-performance, on-device neural network (NN) inference
framework that performs inference of a given NN model on processors, such as CPU, GPU, or NPU, in
the target platform, such as Tizen and Smart Machine Platform (SMP).

### Expected Value

We expect the following would be possible with _nnfw_:

- To improve user experience by reducing the service response time.
- To provide AI services without network connection while achieving similar performance.
- To protect personal information and company confidential by limiting data transfer to the network.


### Success Criteria

The goals of this project are:

- Supports mixed acceleration using CPU and GPU.
  + for operator coverage flexibility.
  + for flexible utilization of computing resource on device.
- Define Common IR and Runtime APIs and perform successful inference using it.
  + for _nncc_ integration as a frontend, and _nnfw_ itself to concentrate on the backend.
- Support of user implemented kernel extension for custom operator.
- Construction of SW infrastructure to support SR NPU.


### Target

_nnfw_ targets following platforms and target devices:

- Odroid-XU4 running Tizen 5.5 (Primary)
- A variety of Android based mobile phones (Secondary)


### Product Roadmap

- March: Set up milestones, tasks, workgroups, initial code structure, and build/test
  infrastructure.
- May: Tizen M1 release / Execute InceptionV3 with static scheduling by mixing CPU and GPU on
  Odroid-XU4.
- August: Perform inference using NN Package and Runtime API.
- October: Tizen M2 release / Completed neural network acceleration SW stack integrated with NN
  Compiler.
- December: Release NPU SDK v1.0 pre-alpha.


## Requirements

### Functionality Requirements

_nnfw_ has the following functionality requirements:

1. CPU/GPU mixed acceleration
    - Description
      + Run the model using a mixture of CPU and GPU
    - Validation
      + Run the InceptionV3 model by selecting CPU or GPU for individual operators.
      + Confirm execution results against ground truth(for example, the result of using CPU or GPU
        alone).
1. Support its own input format
   - Description
      + Define and support its own input format to ensure file format independence
      + Define and implement of Common IR
   - Validation
      + Read and execute the input model described by Common IR.
      + Confirm execution results against ground truth(for example, the result of using NNAPI).
1. Support its own API
   - Description
      + Define and implement its own API to replace current NNAPI.
   - Validation
      + Perform unit tests and integration tests for individual APIs.
1. Custom operator support
   - Description
      + Defines a specification that describes a custom operation, and provides a mechanism for
        writing and installing the kernel implementation needed to run it at runtime.
   - Validation
      + Load and execute the input format that contains the model using the custom op and the kernel
        implementation of the custom op.
      + Confirm execution results against ground truth.
1. Prepare SW infrastructure for NPU support
    - Description
      + The runtime must be able to read and process the model information developed for the purpose
        of execution in NPU.
   - Validation
      + Read the model developed for the NPU and run it on the NPU.
      + Confirm execution results against ground truth.


### Non-Functionality Requirements

1. Optimizing mixed acceleration performance
    - Description
      + Ensure performance above the individual acceleration averages for mixed acceleration using
        CPU and GPU.
    - Validation
      + Measure the inference time for mixed accelerations for the target model.
      + Compare the result to the average value of the CPU or GPU alone acceleration time for the
        same model.