summaryrefslogtreecommitdiff
path: root/kdbus.txt
blob: da0a72278c092a7bbdc8580fbbab46d02c917ed8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
D-Bus is a system for low-latency, low-overhead, easy to use interprocess
communication (IPC).

The focus of this document is an overview of the low-level, native kernel D-Bus
transport called kdbus. Kdbus in the kernel acts similar to a device driver,
all communication between processes take place over special character device
nodes in /dev/kdbus/.

For the general D-Bus protocol specification, the payload format, the
marshaling, the communication semantics, please refer to:
  http://dbus.freedesktop.org/doc/dbus-specification.html

For a kdbus specific userspace library implementation please refer to:
  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
  http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-memfd.h

Article about D-Bus and kdbus:
  http://lwn.net/Articles/580194/

===============================================================================
Terminology
===============================================================================
  Domain:
    A domain is a named object containing a number of buses. A system
    container which contains its own init system and users usually also
    runs in its own kdbus domain. The /dev/kdbus/domain/<container-name>/
    directory shows up inside the domain as /dev/kdbus/. Every domain
    offers a "control" device node to create new buses or domains.
    Domains have no connection to each other, cannot see or talk to
    each other. Only from the initial domain, given the process has the
    needed access rights, the device nodes inside of other domains
    can be seen.

  Bus:
    A bus is a named object inside a domain. Clients exchange messages
    over a bus. Multiple buses themselves have no connection to each other,
    messages are only exchanged on the same bus. The default entry point to a
    bus, where clients establish the connection to, is the "bus" device node
    /dev/kdbus/<bus name>/bus.
    Common operating system setups create one "system bus" per system, and one
    "user bus" for every logged-in user. Applications or services can create
    their own private named buses if they want to.

  Endpoint:
    An endpoint provides the device node to talk to a bus. Every bus has
    a default endpoint called "bus". A bus can offer additional endpoints
    with custom names to provide a restricted access to the same bus. Custom
    endpoints can carry additional policy which can be used to give sandboxed
    processes only a locked-down, limited, filtered access to a bus.

  Connection:
    A connection to a bus is created by opening an endpoint device node of
    a bus, and becoming an active client with the HELLO exchange. Every
    connected client connection has a unique identifier on the bus, and can
    address messages to every other connection on the same bus by using
    the peer's connection id as the destination.

  Well-known Name:
    A connection can, in addition to its implicit unique connection id, request
    the ownership of a textual well-known name. Well-known names are noted
    in reverse-domain notation like com.example.service. Connections offering
    a service on a bus are usually reached by its well-known name. The analogy
    of connection id and well-known name is an IP address and a DNS name
    associated with that address.

===============================================================================
Device Node Layout
===============================================================================
  /sys/bus/kdbus
  `-- devices
    |-- kdbus!0-system!bus -> ../../../devices/virtual/kdbus/kdbus!0-system!bus
    |-- kdbus!2702-user!bus -> ../../../devices/virtual/kdbus/kdbus!2702-user!bus
    |-- kdbus!2702-user!ep.app -> ../../../devices/virtual/kdbus/kdbus!2702-user!ep.app
    `-- kdbus!control -> ../../../devices/kdbus!control

  /dev/kdbus
  |-- control
  |-- 0-system
  |   |-- bus
  |   `-- ep.apache
  |-- 1000-user
  |   `-- bus
  |-- 2702-user
  |   |-- bus
  |   `-- ep.app
  `-- ns
      |-- fedoracontainer
      |   |-- control
      |   |-- 0-system
      |   |   `-- bus
      |   `-- 1000-user
      |       `-- bus
      `-- mydebiancontainer
          |-- control
          `-- 0-system
              `-- bus

Note:
  The device node subdirectory layout is arranged that a future version of
  kdbus could be implemented as a filesystem with a separate instance mounted
  for each domain. For any future changes, this always needs to be kept
  in mind. Also the dependency on udev's userspace hookups or sysfs attribute
  use should be limited to the absolute minimum for the same reason.

===============================================================================
Data Structures
===============================================================================
  +-------------------------------------------------------------------------+
  | Domain (Init Domain)                                                    |
  | /dev/kdbus/control                                                      |
  | +---------------------------------------------------------------------+ |
  | | Bus (System Bus)                                                    | |
  | | ./0-system/control                                                  | |
  | | +-------------------------------+ +-------------------------------+ | |
  | | | Endpoint                      | | Endpoint                      | | |
  | | | ./bus                         | | ./ep.sandbox                  | | |
  | | | +------------+ +------------+ | | +------------+ +------------+ | | |
  | | | | Connection | | Connection | | | | Connection | | Connection | | | |
  | | | | :1.22      | | :1.25      | | | | :1.55      | | :1:81      | | | |
  | | | +------------+ +------------+ | | +------------+ +------------+ | | |
  | | +-------------------------------+ +-------------------------------+ | |
  | +---------------------------------------------------------------------+ |
  |                                                                         |
  | +---------------------------------------------------------------------+ |
  | | Bus (User Bus for UID 2702)                                         | |
  | | /dev/kdbus/2702-user/                                               | |
  | | +-------------------------------+ +-------------------------------+ | |
  | | | Endpoint                      | | Endpoint                      | | |
  | | | /dev/kdbus/2702-user/bus      | | /dev/kdbus/2702-user/ep.app   | | |
  | | | +------------+ +------------+ | | +------------+ +------------+ | | |
  | | | | Connection | | Connection | | | | Connection | | Connection | | | |
  | | | | :1.22      | | :1.25      | | | | :1.55      | | :1:81      | | | |
  | | | +------------+ +------------+ | | +------------+ +------------+ | | |
  | | +-------------------------------+ +-------------------------------+ | |
  | +---------------------------------------------------------------------+ |
  +-------------------------------------------------------------------------+
  | Domain (Container; inside it, fedoracontainer/ becomes /dev/kdbus/)     |
  | /dev/kdbus/domain/fedoracontainer/control                               |
  | +---------------------------------------------------------------------+ |
  | | Bus                                                                 | |
  | | ./0-system/                                                         | |
  | | +---------------------------------+                                 | |
  | | | Endpoint                        |                                 | |
  | | | ./bus                           |                                 | |
  | | | +-------------+ +-------------+ |                                 | |
  | | | | Connection  | | Connection  | |                                 | |
  | | | | :1.22       | | :1.25       | |                                 | |
  | | | +-------------+ +-------------+ |                                 | |
  | | +---------------------------------+                                 | |
  | +---------------------------------------------------------------------+ |
  |                                                                         |
  | +---------------------------------------------------------------------+ |
  | | Bus                                                                 | |
  | | /dev/kdbus/2702-user/                                               | |
  | | +---------------------------------+                                 | |
  | | | Endpoint                        |                                 | |
  | | | /dev/kdbus/2702-user/bus        |                                 | |
  | | | +-------------+ +-------------+ |                                 | |
  | | | | Connection  | | Connection  | |                                 | |
  | | | | :1.22       | | :1.25       | |                                 | |
  | | | +-------------+ +-------------+ |                                 | |
  | | +---------------------------------+                                 | |
  | +---------------------------------------------------------------------+ |
  +-------------------------------------------------------------------------+

===============================================================================
Creation of new Domains and Buses
===============================================================================
The initial kdbus domain is unconditionally created by the kernel module. A
domain contains a "control" device node which allows to create a new bus or
domain. New domains do not have any buses created by default.

Opening the control device node returns a file descriptor, it accepts the
ioctls KDBUS_CMD_BUS_MAKE/KDBUS_CMD_NS_MAKE which specify the name of the new
bus or domain to create. The control file descriptor needs to be kept open
for the entire life-time of the created bus or domain, closing it will
immediately cleanup the entire bus or domain and all its associated
resources and connections. Every control file descriptor can only be used once
to create a new bus or domain; from that point, it is not used for any
further communication until the final close().

===============================================================================
Connection IDs and Well-Known Connection Names
===============================================================================
Connections are identified by their connection id, internally implemented as a
uint64_t counter. The IDs of every newly created bus start at 1, and every new
connection will increment the counter by 1. The ids are not reused.

In higher level tools, the user visible representation of a connection is
defined by the D-Bus protocol specification as ":1.<id>".

Messages with a specific uint64_t destination id are directly delivered to
the connection with the corresponding id. Messages with the special destination
id 0xffffffffffffffff are broadcast messages and are potentially delivered
to all known connections on the bus; clients interested in broadcast messages
need to subscribe to the specific messages they are interested though, before
any broadcast message reaches them.

Messages synthesized and sent directly by the kernel, will carry the special
source id 0.

In addition to the unique uint64_t connection id, established connections can
request the ownership of well-known names, under which they can be found and
addressed by other bus clients. A well-known name is associated with one and
only one connection at a time.

Messages can specify the special destination id 0 and carry a well-known name
in the message data. Such a message is delivered to the destination connection
which owns that well-known name.

  +-------------------------------------------------------------------------+
  | +---------------+     +---------------------------+                     |
  | | Connection    |     | Message                   | -----------------+  |
  | | :1.22         | --> | src: 22                   |                  |  |
  | |               |     | dst: 25                   |                  |  |
  | |               |     |                           |                  |  |
  | |               |     |                           |                  |  |
  | |               |     +---------------------------+                  |  |
  | |               |                                                    |  |
  | |               | <--------------------------------------+           |  |
  | +---------------+                                        |           |  |
  |                                                          |           |  |
  | +---------------+     +---------------------------+      |           |  |
  | | Connection    |     | Message                   | -----+           |  |
  | | :1.25         | --> | src: 25                   |                  |  |
  | |               |     | dst: 0xffffffffffffffff   | -------------+   |  |
  | |               |     |                           |              |   |  |
  | |               |     |                           | ---------+   |   |  |
  | |               |     +---------------------------+          |   |   |  |
  | |               |                                            |   |   |  |
  | |               | <--------------------------------------------------+  |
  | +---------------+                                            |   |      |
  |                                                              |   |      |
  | +---------------+     +---------------------------+          |   |      |
  | | Connection    |     | Message                   | --+      |   |      |
  | | :1.55         | --> | src: 55                   |   |      |   |      |
  | |               |     | dst: 0 / org.foo.bar      |   |      |   |      |
  | |               |     |                           |   |      |   |      |
  | |               |     |                           |   |      |   |      |
  | |               |     +---------------------------+   |      |   |      |
  | |               |                                     |      |   |      |
  | |               | <------------------------------------------+   |      |
  | +---------------+                                     |          |      |
  |                                                       |          |      |
  | +---------------+                                     |          |      |
  | | Connection    |                                     |          |      |
  | | :1.81         |                                     |          |      |
  | | org.foo.bar   |                                     |          |      |
  | |               |                                     |          |      |
  | |               |                                     |          |      |
  | |               | <-----------------------------------+          |      |
  | |               |                                                |      |
  | |               | <----------------------------------------------+      |
  | +---------------+                                                       |
  +-------------------------------------------------------------------------+

===============================================================================
Message Format, Content, Exchange
===============================================================================
Messages consist of fixed-size header followed directly by a list of
variable-sized data records. The overall message size is specified in the
header of the message. The chain of data records can contain well-defined
message metadata fields, raw data, references to data, or file descriptors.

Messages are passed to the kernel with the ioctl KDBUS_CMD_MSG_SEND. Depending
on the the destination address of the message, the kernel delivers the message
to the specific destination connection or to all connections on the same bus.
Messages are always queued in the destination connection.

Messages are received by the client with the ioctl KDBUS_CMD_MSG_RECV. The
endpoint device node of the bus supports poll() to wake up the receiving
process when new messages are queued up to be received.

  +-------------------------------------------------------------------------+
  | Message                                                                 |
  | +---------------------------------------------------------------------+ |
  | | Header                                                              | |
  | | size: overall message size, including the data records              | |
  | | destination: connection id of the receiver                          | |
  | | source: connection id of the sender (set by kernel)                 | |
  | | payload_type: "DBusVer1" textual identifier stored as uint64_t      | |
  | +---------------------------------------------------------------------+ |
  | +---------------------------------------------------------------------+ |
  | | Data Record                                                         | |
  | | size: overall record size (without padding)                         | |
  | | type: type of data                                                  | |
  | | data: reference to data (address or file descriptor)                | |
  | +---------------------------------------------------------------------+ |
  | +---------------------------------------------------------------------+ |
  | | padding bytes to the next 8 byte alignment                          | |
  | +---------------------------------------------------------------------+ |
  | +---------------------------------------------------------------------+ |
  | | Data Record                                                         | |
  | | size: overall record size (without padding)                         | |
  | | ...                                                                 | |
  | +---------------------------------------------------------------------+ |
  | +---------------------------------------------------------------------+ |
  | | padding bytes to the next 8 byte alignment                          | |
  | +---------------------------------------------------------------------+ |
  | +---------------------------------------------------------------------+ |
  | | Data Record                                                         | |
  | | size: overall record size                                           | |
  | | ...                                                                 | |
  | +---------------------------------------------------------------------+ |
  | +---------------------------------------------------------------------+ |
  | | padding bytes to the next 8 byte alignment                          | |
  | +---------------------------------------------------------------------+ |
  +-------------------------------------------------------------------------+

===============================================================================
Passing of Payload Data
===============================================================================
When connecting to the bus, receivers request a memory pool of a given size,
large enough to carry all backlog of data enqueued for the connection. The
pool is internally backed by a shared memory file which can be mmap()ed by
the receiver.

KDBUS_MSG_PAYLOAD_VEC:
Messages are directly copied by the sending process into the receiver's pool,
that way two peers can exchange data by effectively doing a single-copy from
one process to another, the kernel will not buffer the data anywhere else.

KDBUS_MSG_PAYLOAD_MEMFD:
Messages can reference kdbus_memfd special files which contain the data.
Kdbus_memfd files have special semantics, which allow the sealing of the
content of the file, sealing prevents all writable access to the file content.
Only sealed kdbus_memfd files are accepted as payload data, which enforces
reliable passing of data; the receiver can assume that the sender and nobody
else can alter the content after the message is sent.

Apart from the sender filling-in the content into the kdbus_memfd file, the
data will be passed as zero-copy from one process to another, read-only, shared
between the peers.

The sealing of a kdbus_memfd can be removed again by the sender or the
receiver, as soon as the kdbus_memfd is not shared anymore.

===============================================================================
Broadcast Messages
===============================================================================
A message addressed at the connection ID 0 is a broadcast message, delivered
to all connected peers which installed a rule to match certain properties of
the message. Without any rules installed in the connection, no broadcast
message will de delivered to the connection.

Matches are implemented as bloom filters. The sender adds certain properties of
the message as elements to a bloom filter bit field, and sends that along with
the broadcast message.

The connection adds the message properties it is interested as elements to a
bloom mask bit field, and uploads the mask to the match rules of the
connection.

The kernel will match the broadcast message's bloom filter against the
connections bloom mask and decide if the message should be delivered to
the connection.

The kernel has no notion of any specific properties of the message, all it
sees are the bit fields of the bloom filter and mask to match against. The
use of bloom filters allows simple and efficient matching, wihthout exposing
any message properties or internals to the kernel side. Clients need to deal
with the fact that they might receive broadcasts which they did not subscribe
to, the bloom filter might allow false-positives to pass the filter.

To allow the future extension of the set of elements in the bloom filter, the
filter specifies a "generation" number. A later generation must always contain
all elements of the set of the previous generation, but can add new elements
to the set. The match rules mask can carry an array with all previous
generations of masks individually stored. When the filter and mask are matched
by the kernel, the mask with the closest matching "generation" is selected
as the index into the mask array.