D-Bus is a system for powerful, easy to use interprocess communication (IPC). The focus of this document is an overview of the low-level, native kernel D-Bus transport called kdbus. Kdbus in the kernel acts similar to a device driver, all communication between processes take place over special character device nodes in /dev/kdbus/. For the general D-Bus protocol specification, the payload format, the marshaling, and the communication semantics, please refer to: http://dbus.freedesktop.org/doc/dbus-specification.html For a kdbus specific userspace library implementation please refer to: http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-memfd.h Article about D-Bus and kdbus: http://lwn.net/Articles/580194/ =============================================================================== Terminology =============================================================================== Domain: A domain is a named object containing a number of buses. A system container that contains its own init system and users usually also runs in its own kdbus domain. The /dev/kdbus/domain// directory shows up inside the domain as /dev/kdbus/. Every domain offers its own "control" device node to create new buses or new sub-domains. Domains have no connection to each other and cannot see nor talk to each other. Bus: A bus is a named object inside a domain. Clients exchange messages over a bus. Multiple buses themselves have no connection to each other; messages can only be exchanged on the same bus. The default entry point to a bus, where clients establish the connection to, is the "bus" device node /dev/kdbus//bus. Common operating system setups create one "system bus" per system, and one "user bus" for every logged-in user. Applications or services may create their own private named buses. Endpoint: An endpoint provides the device node to talk to a bus. Opening an endpoint creates a new connection to the bus to which the endpoint belongs. Every bus has a default endpoint called "bus". A bus can optionally offer additional endpoints with custom names to provide a restricted access to the same bus. Custom endpoints carry additional policy which can be used to give sandboxed processes only a locked-down, limited, filtered access to the same bus. Connection: A connection to a bus is created by opening an endpoint device node of a bus and becoming an active client with the HELLO exchange. Every connected client connection has a unique identifier on the bus and can address messages to every other connection on the same bus by using the peer's connection id as the destination. Well-known Name: A connection can, in addition to its implicit unique connection id, request the ownership of a textual well-known name. Well-known names are noted in reverse-domain notation, such as com.example.service1. Connections offering a service on a bus are usually reached by its well-known name. The analogy of connection id and well-known name is an IP address and a DNS name associated with that address. Message: Connections can exchange messages with other connections by addressing the peers with their connection id or well-known name. A message consists of a message header with kernel-specific information on how to route the message, and the message payload, which is a logical byte stream of arbitrary size. Messages can carry additional file descriptors to be passed from one connection to another. Every connection can specify which set of metadata the kernel should attach to the message when it is delivered to the receiving connection. Metadata contains information like: system timestamps, uid, gid, tid, proc-starttime, well-known-names, process comm, process exe, process argv, cgroup, capabilities, seclabel, audit session and loginuid and the connection's human-readable name. Broadcast and Match: Broadcast messages are potentially sent to all connections of a bus. By default, the connections will not actually receive any of the sent broadcast messages; only after installing a match for specific message properties, a broadcast message passes this filter. Policy: Buses and custom endpoints can upload a set of policy rules, defining who can see, talk to, or register a well-know name on the bus. The policy is uploaded by the owner of the bus or the custom endpoint. Access rules to allow who can see a name on the bus are only checked on custom endpoints. Policies may be defined with names that end with '.*'. When matching a well-known name against such a wildcard entry, the last part of the name is ignored and checked against the wildcard name without the trailing '.*'. =============================================================================== Device Node Layout =============================================================================== /sys/bus/kdbus `-- devices |-- kdbus!0-system!bus -> ../../../devices/virtual/kdbus/kdbus!0-system!bus |-- kdbus!2702-user!bus -> ../../../devices/virtual/kdbus/kdbus!2702-user!bus |-- kdbus!2702-user!ep.app -> ../../../devices/virtual/kdbus/kdbus!2702-user!ep.app `-- kdbus!control -> ../../../devices/kdbus!control /dev/kdbus |-- control |-- 0-system | |-- bus | `-- ep.apache |-- 1000-user | `-- bus |-- 2702-user | |-- bus | `-- ep.app `-- domain |-- fedoracontainer | |-- control | |-- 0-system | | `-- bus | `-- 1000-user | `-- bus `-- mydebiancontainer |-- control `-- 0-system `-- bus Note: The device node subdirectory layout is arranged that a future version of kdbus could be implemented as a filesystem with a separate instance mounted for each domain. For any future changes, this always needs to be kept in mind. Also the dependency on udev's userspace hookups or sysfs attribute use should be limited to the absolute minimum for the same reason. =============================================================================== Data Structures =============================================================================== +-------------------------------------------------------------------------+ | Domain (Init Domain) | | /dev/kdbus/control | | +---------------------------------------------------------------------+ | | | Bus (System Bus) | | | | /dev/kdbus/0-system/ | | | | +-------------------------------+ +-------------------------------+ | | | | | Endpoint | | Endpoint | | | | | | /dev/kdbus/0-system/bus | | /dev/kdbus/0-system/ep.app | | | | | +-------------------------------+ +-------------------------------+ | | | | +--------------+ +--------------+ +--------------+ +--------------+ | | | | | Connection | | Connection | | Connection | | Connection | | | | | | :1.22 | | :1.25 | | :1.55 | | :1:81 | | | | | +--------------+ +--------------+ +--------------+ +--------------+ | | | +---------------------------------------------------------------------+ | | | | +---------------------------------------------------------------------+ | | | Bus (User Bus for UID 2702) | | | | /dev/kdbus/2702-user/ | | | | +-------------------------------+ +-------------------------------+ | | | | | Endpoint | | Endpoint | | | | | | /dev/kdbus/2702-user/bus | | /dev/kdbus/2702-user/ep.app | | | | | +-------------------------------+ +-------------------------------+ | | | | +--------------+ +--------------+ +--------------+ +--------------+ | | | | | Connection | | Connection | | Connection | | Connection | | | | | | :1.22 | | :1.25 | | :1.55 | | :1:81 | | | | | +--------------+ +--------------+ +-------------------------------+ | | | +---------------------------------------------------------------------+ | | | | +---------------------------------------------------------------------+ | | | Domain (Container; inside it, fedoracontainer/ becomes /dev/kdbus/) | | | | /dev/kdbus/domain/fedoracontainer/control | | | | +-----------------------------------------------------------------+ | | | | | Bus (System Bus of "fedoracontainer") | | | | | | /dev/kdbus/domain/fedoracontainer/0-system/ | | | | | | +-----------------------------+ | | | | | | | Endpoint | | | | | | | | /dev/.../0-system/bus | | | | | | | +-----------------------------+ | | | | | | +-------------+ +-------------+ | | | | | | | Connection | | Connection | | | | | | | | :1.22 | | :1.25 | | | | | | | +-------------+ +-------------+ | | | | | +-----------------------------------------------------------------+ | | | | | | | | +-----------------------------------------------------------------+ | | | | | Bus (User Bus for UID 270 of "fedoracontainer") | | | | | | /dev/kdbus/domain/fedoracontainer/2702-user/ | | | | | | +-----------------------------+ | | | | | | | Endpoint | | | | | | | | /dev/.../2702-user/bus | | | | | | | +-----------------------------+ | | | | | | +-------------+ +-------------+ | | | | | | | Connection | | Connection | | | | | | | | :1.22 | | :1.25 | | | | | | | +-------------+ +-------------+ | | | | | +-----------------------------------------------------------------+ | | | +---------------------------------------------------------------------+ | +-------------------------------------------------------------------------+ =============================================================================== Creation of new Domains and Buses =============================================================================== The initial kdbus domain is unconditionally created by the kernel module. A domain contains a "control" device node which allows to create a new bus or domain. New domains do not have any buses created by default. Opening the control device node returns a file descriptor, it accepts the ioctls KDBUS_CMD_BUS_MAKE/KDBUS_CMD_DOMAIN_MAKE which specify the name of the new bus or domain to create. The control file descriptor needs to be kept open for the entire life-time of the created bus or domain, closing it will immediately cleanup the entire bus or domain and all its associated resources and connections. Every control file descriptor can only be used once to create a new bus or domain; from that point, it is not used for any further communication until the final close(). =============================================================================== Connection IDs and Well-Known Connection Names =============================================================================== Connections are identified by their connection id, internally implemented as a uint64_t counter. The IDs of every newly created bus start at 1, and every new connection will increment the counter by 1. The ids are not reused. In higher level tools, the user visible representation of a connection is defined by the D-Bus protocol specification as ":1.". Messages with a specific uint64_t destination id are directly delivered to the connection with the corresponding id. Messages with the special destination id 0xffffffffffffffff are broadcast messages and are potentially delivered to all known connections on the bus; clients interested in broadcast messages need to subscribe to the specific messages they are interested though, before any broadcast message reaches them. Messages synthesized and sent directly by the kernel, will carry the special source id 0. In addition to the unique uint64_t connection id, established connections can request the ownership of well-known names, under which they can be found and addressed by other bus clients. A well-known name is associated with one and only one connection at a time. Messages can specify the special destination id 0 and carry a well-known name in the message data. Such a message is delivered to the destination connection which owns that well-known name. +-------------------------------------------------------------------------+ | +---------------+ +---------------------------+ | | | Connection | | Message | -----------------+ | | | :1.22 | --> | src: 22 | | | | | | | dst: 25 | | | | | | | | | | | | | | | | | | | | +---------------------------+ | | | | | | | | | | <--------------------------------------+ | | | +---------------+ | | | | | | | | +---------------+ +---------------------------+ | | | | | Connection | | Message | -----+ | | | | :1.25 | --> | src: 25 | | | | | | | dst: 0xffffffffffffffff | -------------+ | | | | | | | | | | | | | | | ---------+ | | | | | | +---------------------------+ | | | | | | | | | | | | | | <--------------------------------------------------+ | | +---------------+ | | | | | | | | +---------------+ +---------------------------+ | | | | | Connection | | Message | --+ | | | | | :1.55 | --> | src: 55 | | | | | | | | | dst: 0 / org.foo.bar | | | | | | | | | | | | | | | | | | | | | | | | | | +---------------------------+ | | | | | | | | | | | | | | <------------------------------------------+ | | | +---------------+ | | | | | | | | +---------------+ | | | | | Connection | | | | | | :1.81 | | | | | | org.foo.bar | | | | | | | | | | | | | | | | | | | <-----------------------------------+ | | | | | | | | | | <----------------------------------------------+ | | +---------------+ | +-------------------------------------------------------------------------+ =============================================================================== Message Format, Content, Exchange =============================================================================== Messages consist of fixed-size header followed directly by a list of variable-sized data records. The overall message size is specified in the header of the message. The chain of data records can contain well-defined message metadata fields, raw data, references to data, or file descriptors. Messages are passed to the kernel with the ioctl KDBUS_CMD_MSG_SEND. Depending on the the destination address of the message, the kernel delivers the message to the specific destination connection or to all connections on the same bus. Messages are always queued in the destination connection. Messages are received by the client with the ioctl KDBUS_CMD_MSG_RECV. The endpoint device node of the bus supports poll() to wake up the receiving process when new messages are queued up to be received. +-------------------------------------------------------------------------+ | Message | | +---------------------------------------------------------------------+ | | | Header | | | | size: overall message size, including the data records | | | | destination: connection id of the receiver | | | | source: connection id of the sender (set by kernel) | | | | payload_type: "DBusDBus" textual identifier stored as uint64_t | | | +---------------------------------------------------------------------+ | | +---------------------------------------------------------------------+ | | | Data Record | | | | size: overall record size (without padding) | | | | type: type of data | | | | data: reference to data (address or file descriptor) | | | +---------------------------------------------------------------------+ | | +---------------------------------------------------------------------+ | | | padding bytes to the next 8 byte alignment | | | +---------------------------------------------------------------------+ | | +---------------------------------------------------------------------+ | | | Data Record | | | | size: overall record size (without padding) | | | | ... | | | +---------------------------------------------------------------------+ | | +---------------------------------------------------------------------+ | | | padding bytes to the next 8 byte alignment | | | +---------------------------------------------------------------------+ | | +---------------------------------------------------------------------+ | | | Data Record | | | | size: overall record size | | | | ... | | | +---------------------------------------------------------------------+ | | +---------------------------------------------------------------------+ | | | padding bytes to the next 8 byte alignment | | | +---------------------------------------------------------------------+ | +-------------------------------------------------------------------------+ =============================================================================== Passing of Payload Data =============================================================================== When connecting to the bus, receivers request a memory pool of a given size, large enough to carry all backlog of data enqueued for the connection. The pool is internally backed by a shared memory file which can be mmap()ed by the receiver. KDBUS_MSG_PAYLOAD_VEC: Messages are directly copied by the sending process into the receiver's pool, that way two peers can exchange data by effectively doing a single-copy from one process to another, the kernel will not buffer the data anywhere else. KDBUS_MSG_PAYLOAD_MEMFD: Messages can reference kdbus_memfd special files which contain the data. Kdbus_memfd files have special semantics, which allow the sealing of the content of the file, sealing prevents all writable access to the file content. Only sealed kdbus_memfd files are accepted as payload data, which enforces reliable passing of data; the receiver can assume that the sender and nobody else can alter the content after the message is sent. Apart from the sender filling-in the content into the kdbus_memfd file, the data will be passed as zero-copy from one process to another, read-only, shared between the peers. The sealing of a kdbus_memfd can be removed again by the sender or the receiver, as soon as the kdbus_memfd is not shared anymore. =============================================================================== Broadcast Message Matching =============================================================================== A message addressed at the connection ID 0 is a broadcast message, delivered to all connected peers which installed a rule to match certain properties of the message. Without any rules installed in the connection, no broadcast message will be delivered to the connection. Matches are implemented as bloom filters. The sender adds certain properties of the message as elements to a bloom filter bit field, and sends that along with the broadcast message. The connection adds the message properties it is interested as elements to a bloom mask bit field, and uploads the mask to the match rules of the connection. The kernel will match the broadcast message's bloom filter against the connections bloom mask and decide if the message should be delivered to the connection. The kernel has no notion of any specific properties of the message, all it sees are the bit fields of the bloom filter and mask to match against. The use of bloom filters allows simple and efficient matching, without exposing any message properties or internals to the kernel side. Clients need to deal with the fact that they might receive broadcasts which they did not subscribe to, the bloom filter might allow false-positives to pass the filter. To allow the future extension of the set of elements in the bloom filter, the filter specifies a "generation" number. A later generation must always contain all elements of the set of the previous generation, but can add new elements to the set. The match rules mask can carry an array with all previous generations of masks individually stored. When the filter and mask are matched by the kernel, the mask with the closest matching "generation" is selected as the index into the mask array.