diff options
Diffstat (limited to 'Documentation/dma-buf-sync.txt')
-rw-r--r-- | Documentation/dma-buf-sync.txt | 258 |
1 files changed, 258 insertions, 0 deletions
diff --git a/Documentation/dma-buf-sync.txt b/Documentation/dma-buf-sync.txt new file mode 100644 index 00000000000..589cf5d3d75 --- /dev/null +++ b/Documentation/dma-buf-sync.txt @@ -0,0 +1,258 @@ + DMA Buffer Synchronization Framework + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + Inki Dae + <inki dot dae at samsung dot com> + <daeinki at gmail dot com> + +This document is a guide for device-driver writers describing the DMA buffer +synchronization API. This document also describes how to use the API to +use buffer synchronization mechanism between DMA and DMA, CPU and DMA, and +CPU and CPU. + +The DMA Buffer synchronization API provides buffer synchronization mechanism +based on DMA buffer sharing machanism[1], dmafence and reservation +frameworks[2]; i.e., buffer access control to CPU and DMA, and easy-to-use +interfaces for device drivers and user application. And this API can be used +for all dma devices using system memory as dma buffer, especially for most +ARM based SoCs. + + +Motivation +---------- + +There are some cases userspace process needs this buffer synchronization +framework. One of which is to primarily enhance GPU rendering performance +in case that 3D app draws somthing in a buffer using CPU, and other process +composes the buffer with its own backbuffer using GPU. + +In case of 3D app, the app calls glFlush to submit 3d commands to GPU driver +instead of glFinish for more performance. The reason, we call glFlush, is +that glFinish blocks caller's task until the execution of the 3d commands is +completed. So that makes GPU and CPU more idle. As a result, 3d rendering +performance with glFinish is quite lower than glFlush. + +However, the use of glFlush has one issue that the the buffer shared with +GPU could be broken when CPU accesses the buffer just after glFlush because +CPU cannot be aware of the completion of GPU access to the buffer. +Of course, an applications can be aware of that time using eglWaitGL +but this function is valid only in case that all application use the same +3d context. However, for the gpu performance, applications could use different +3d contexts. + +The below summarizes how app's window is displayed on screen with X server: +1. X client requests a window buffer to X server. +2. X client draws something in the window buffer using CPU. +3. X client requests SWAP to X server. +4. X server notifies a damage event to Composite Manager. +5. Composite Manager gets the window buffer (front buffer) through + DRI2GetBuffers. +6. Composite Manager composes the window buffer and its own back buffer + using GPU. At this time, eglSwapBuffers is called: internally, 3d + commands are flushed to gpu driver. +7. Composite Manager requests SWAP to X server. +8. X server performs drm page flip. At this time, the window buffer is + displayed on screen. + +HTML5-based web applications also have the same issue. Web browser and +web application are different process. The Web application can draw something +in its own buffer using CPU, and then the Web Browser can compose the buffer +with its own back buffer. + +Thus, in such cases, a shared buffer could be broken as one process draws +something in a buffer using CPU if other process composes the buffer with +its own buffer using GPU without any sync mechanism. That is why we need +userspace sync interface, fcntl system call. + +And last one is a deferred page flip issue. This issue is that a window +buffer rendered can be displayed on screen in about 32ms in worst case: +assume that the gpu rendering would be completed within 16ms. +That could be incurred when compositing a pixmap buffer with a window buffer +using GPU and when vsync is just started. At this time, X server waits for +a vblank event to get a window buffer so 3d rendering will be delayed +up to about 16ms. As a result, the window buffer will be displayed in +two vsyncs (about 32ms), which in turn, that would incur slow responsiveness. + +The below shows the deferred page flip issue in worst case: + + |------------ <- vsync signal + |<------ DRI2GetBuffers + | + | + | + |------------ <- vsync signal + |<------ Request gpu rendering + time | + | + |<------ Request page flip (deferred) + |------------ <- vsync signal + |<------ Displayed on screen + | + | + | + |------------ <- vsync signal + +We could enhance the responsiveness by doing that X server skips to wait +for vsync with sync mechanism: X server will be a new buffer back to X client +without waiting for vsync so X client can use more cpu than waiting for vsync +and the buffer will be synchronized in kernel driver implicitly, which is +transparent to userspace. + + +Access types +------------ + +DMA_BUF_ACCESS_R - CPU will access a buffer for read. +DMA_BUF_ACCESS_W - CPU will access a buffer for read or write. +DMA_BUF_ACCESS_DMA_R - DMA will access a buffer for read +DMA_BUF_ACCESS_DMA_W - DMA will access a buffer for read or write. + + +Generic userspace interfaces +----------------------- + +This framework includes fcntl system call[3] as interfaces exported +to userspace. As you know, a process can see a buffer object as a file +descriptor. So fcntl() call with the file descriptor means that processes +cannot access the buffer being managed by a dma buf object according to fcntl +request command until fcntl call with unlock will be requested. + + +API set +------- + +bool is_dmabuf_sync_supported(void); + - Check if dmabuf sync is supported or not. + +struct dmabuf_sync *dmabuf_sync_init(const char *name, + struct dmabuf_sync_priv_ops *ops, + void *priv); + - Allocate and initialize a new dmabuf_sync. + + This function should be called by DMA driver after device context + is created. The created dmabuf_sync object should be set to the + context of driver. Each DMA driver and task should have one + dmabuf_sync object. + + +void dmabuf_sync_fini(struct dmabuf_sync *sync) + - Release a given dmabuf_sync object and things relevant to it. + + This function should be called if some operation failed after + dmabuf_sync_init call to release relevant resources, and after + dmabuf_sync_signal or dmabuf_sync_signal_all function is called. + + +int dmabuf_sync_get(struct dmabuf_sync *sync, void *sync_buf, + unsigned int ctx, unsigned int type) + - Add a given dmabuf object to dmabuf_sync object. + + This function should be called after dmabuf_sync_init function is + called. The caller can tie up multiple dmabufs into its own + dmabuf_sync object by calling this function several times. + + +void dmabuf_sync_put(struct dmabuf_sync *sync, struct dma_buf *dmabuf) + - Delete a dmabuf_sync_object object to a given dmabuf. + + This function should be called if some operation failed after + dmabuf_sync_get function is called to release the dmabuf or + after DMA driver or task completes the use of the dmabuf. + + +void dmabuf_sync_put_all(struct dmabuf_sync *sync) + - Release all dmabuf_sync_object objects to a given dmabuf_sync object. + + This function should be called if some operation failed after + dmabuf_sync_get function is called to release all dmabuf_sync_object + objects, or after DMA driver or task completes the use of all + dmabufs. + +long dmabuf_sync_wait_all(struct dmabuf_sync *sync) + - Wait for the completion of DMA or CPU access to all dmabufs. + + The caller should call this function prior to CPU or DMA access to + dmabufs so that other CPU or DMA cannot access the dmabufs. + +int dmabuf_sync_wait(struct dma_buf *dmabuf, unsigned int ctx, + unsigned int access_type) + - Wait for the completion of DMA or CPU access to a dmabuf. + + The caller should call this function prior to CPU or DMA access to + a dmabuf so that other CPU and DMA device cannot access the dmabuf. + +int dmabuf_sync_signal_all(struct dmabuf_sync *sync) + - Wake up all threads blocked when they tried to access dmabufs + registered to a given dmabuf_sync object. + + The caller should call this function after CPU or DMA access to + the dmabufs is completed so that other CPU and DMA device can access + the dmabufs. + +void dmabuf_sync_signal(struct dma_buf *dmabuf) + - Wake up all threads blocked when they tried to access a given dmabuf. + + The caller should call this function after CPU or DMA access to + the dmabuf is completed so that other CPU and DMA device can access + the dmabuf. + + +Tutorial for device driver +-------------------------- + +1. Allocate and Initialize a dmabuf_sync object: + struct dmabuf_sync *sync; + + sync = dmabuf_sync_init("test sync", &xxx_sync_ops, context); + ... + +2. Add a dmabuf to the dmabuf_sync object when setting up dma buffer + relevant registers: + dmabuf_sync_get(sync, dmabuf, context, DMA_BUF_ACCESS_READ); + ... + +3. Add a fence of this driver to all dmabufs added to the dmabuf_sync + object before DMA or CPU accesses the dmabufs: + dmabuf_sync_wait_all(sync); + ... + +4. Now the DMA of this device can access all dmabufs. + +5. Signal all dmabufs added to a dmabuf_sync object after DMA or CPU access + to these dmabufs is completed: + dmabuf_sync_signal_all(sync); + + And call the following functions to release all resources, + dmabuf_sync_put_all(sync); + dmabuf_sync_fini(sync); + + +Tutorial for user application +----------------------------- + struct flock filelock; + +1. Lock a dmabuf: + filelock.l_type = F_WRLCK or F_RDLCK; + + /* lock entire region to the dma buf. */ + filelock.lwhence = SEEK_CUR; + filelock.l_start = 0; + filelock.l_len = 0; + + fcntl(dmabuf fd, F_SETLKW or F_SETLK, &filelock); + ... + CPU can access to the dmabuf + +2. Unlock a dmabuf: + filelock.l_type = F_UNLCK; + + fcntl(dmabuf fd, F_SETLKW or F_SETLK, &filelock); + + close(dmabuf fd) call would also unlock the dma buf. And for more + detail, please refer to [3] + + +References: +[1] http://lwn.net/Articles/470339/ +[2] https://lkml.org/lkml/2014/2/24/824 +[3] http://linux.die.net/man/2/fcntl |