docs/nnfw/roadmap.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

This document describes roadmap of 2019 NN Runtime (or _nnfw_) project.

# Goal

This project _nnfw_ aims at providing a high-performance, on-device neural network (NN) inference
framework that performs inference of a given NN model on processors, such as CPU, GPU, or NPU, in
the target platform, such as Tizen and Android. 

Last year in 2018, we already saw significant gains in accelerating with a single CPU or GPU
back-end. Now we want to gain more benefits by using a mixture of CPU and GPU according to each
operation characteristic. It could give us an opportunity to have a high degree of freedom in terms
of operator coverage, and possibly provide better performance compared to single back-end
acceleration.

On the other hand, we are going to introduce a new compiler to the front-end. This will support a
variety of deep learning frameworks in relatively spacious host PC environments, while the runtime
running on the target device is intended to take a smaller burden. In this process, the compiler and
the runtime will effectively share information among themselves by the Common IR, which is referred
to as the NN Package.

# Architecture 

![nnfw_architecture](./fig/nnfw_architecture.png)

The figure above illustrates the overall architecture and scope of _nnfw_,  along with  _nncc_, a
sibling project, to help understand. In this document, we will deal specifically with _nnfw_. 

The _nnfw_ can be divided into three parts which is NN API and NN Runtime, as well as NN Compute
that is provided by the platform.

1. NN API
    - Provide a common interface to application.
    - Last year, Android NN API was selected for seamless integration with TF Lite. As long as our
      NN runtime provides Android NN API as an interface, TF Lite can link to our NN runtime without
      any modification.
    - In choosing Android NN API, we expected standardization and rapid adoption. But the results
      were far less than that. We could not control its specifications, and its growth rate was too
      slow to accommodate our needs. So we try to define our own new one, NN Runtime API, in this
      year. (Once the new API is stable, we provide a way to replace the Android NN API and it will
      naturally be deprecated.)
1. NN Runtime
    - It already provides significant performance improvements using CPU or GPU acceleration. Now we
      want to add the flexibility to this by providing various functions suitable to specific device
      configuration.
    - Mixed back-end acceleration enables various usage scenarios according to device-specific CPU
      or GPU configurations and usage conditions.
    - By introducing an interpreter, it will respond to dynamic conditions that the compiler can not
      handle, and will effectively utilize the memory through the memory manager.
1. NN Compute
    - Provide computation acceleration library, such as ACL, or device driver for NPU.
    - This layer will be provided by OS platform, and we will use the library or device driver as it
      is. We may request a specific version to the Platform team, but we don't expect we will be
      modifying the library.
    - In this year, we will also introduce an extension mechanism to support custom operations on
      this part.

# Deliverables

- On-Device AI SW stack for Tizen
  + Advanced runtime support with interpreter, memory manager, and execution planner.
  + Provides back-end flexibility, such as CPU/GPU mixed acceleration
  + Well designed custom op support.
  + Basic infrastructure for NPU support.
- Specification and implementation of Common IR and Runtime API

# Milestones

- [Project Milestones](https://github.sec.samsung.net/orgs/STAR/projects/1)
- [Monthly Milestones](https://github.sec.samsung.net/STAR/nnfw/projects/25)

# Workgroups (WGs)

- We organize WGs for major topics, and each WG will be working on its own major topic by breaking
  it into small tasks/issues, performing them inside WG, and collaborating between WGs.
- The WG information can be found [here](workgroups.md).