summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2018-04-02 21:22:12 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2018-04-02 21:22:12 -0700
commit642e7fd23353e22290e3d51719fcb658dc252342 (patch)
tree93688d5ff15836d8e5b0e097748f7fabb13a303a /Documentation
parent21035965f60b0502fc6537b232839389bb4ce664 (diff)
parentc9a211951c7c79cfb5de888d7d9550872868b086 (diff)
downloadlinux-riscv-642e7fd23353e22290e3d51719fcb658dc252342.tar.gz
linux-riscv-642e7fd23353e22290e3d51719fcb658dc252342.tar.bz2
linux-riscv-642e7fd23353e22290e3d51719fcb658dc252342.zip
Merge branch 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_*() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_*() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/process/adding-syscalls.rst34
1 files changed, 33 insertions, 1 deletions
diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst
index 8cc25a06f353..314c8bf6f2a2 100644
--- a/Documentation/process/adding-syscalls.rst
+++ b/Documentation/process/adding-syscalls.rst
@@ -222,7 +222,7 @@ your new syscall number may get adjusted to resolve conflicts.
The file ``kernel/sys_ni.c`` provides a fallback stub implementation of each
system call, returning ``-ENOSYS``. Add your new system call here too::
- cond_syscall(sys_xyzzy);
+ COND_SYSCALL(xyzzy);
Your new kernel functionality, and the system call that controls it, should
normally be optional, so add a ``CONFIG`` option (typically to
@@ -487,6 +487,38 @@ patchset, for the convenience of reviewers.
The man page should be cc'ed to linux-man@vger.kernel.org
For more details, see https://www.kernel.org/doc/man-pages/patches.html
+
+Do not call System Calls in the Kernel
+--------------------------------------
+
+System calls are, as stated above, interaction points between userspace and
+the kernel. Therefore, system call functions such as ``sys_xyzzy()`` or
+``compat_sys_xyzzy()`` should only be called from userspace via the syscall
+table, but not from elsewhere in the kernel. If the syscall functionality is
+useful to be used within the kernel, needs to be shared between an old and a
+new syscall, or needs to be shared between a syscall and its compatibility
+variant, it should be implemented by means of a "helper" function (such as
+``kern_xyzzy()``). This kernel function may then be called within the
+syscall stub (``sys_xyzzy()``), the compatibility syscall stub
+(``compat_sys_xyzzy()``), and/or other kernel code.
+
+At least on 64-bit x86, it will be a hard requirement from v4.17 onwards to not
+call system call functions in the kernel. It uses a different calling
+convention for system calls where ``struct pt_regs`` is decoded on-the-fly in a
+syscall wrapper which then hands processing over to the actual syscall function.
+This means that only those parameters which are actually needed for a specific
+syscall are passed on during syscall entry, instead of filling in six CPU
+registers with random user space content all the time (which may cause serious
+trouble down the call chain).
+
+Moreover, rules on how data may be accessed may differ between kernel data and
+user data. This is another reason why calling ``sys_xyzzy()`` is generally a
+bad idea.
+
+Exceptions to this rule are only allowed in architecture-specific overrides,
+architecture-specific compatibility wrappers, or other code in arch/.
+
+
References and Sources
----------------------