summaryrefslogtreecommitdiff
path: root/ipc/msg.c
AgeCommit message (Collapse)AuthorFilesLines
2014-03-23ipc: Fix 2 bugs in msgrcv() MSG_COPY implementationMichael Kerrisk1-0/+2
commit 4f87dac386cc43d5525da7a939d4b4e7edbea22c upstream. While testing and documenting the msgrcv() MSG_COPY flag that Stanislav Kinsbursky added in commit 4a674f34ba04 ("ipc: introduce message queue copy feature" => kernel 3.8), I discovered a couple of bugs in the implementation. The two bugs concern MSG_COPY interactions with other msgrcv() flags, namely: (A) MSG_COPY + MSG_EXCEPT (B) MSG_COPY + !IPC_NOWAIT The bugs are distinct (and the fix for the first one is obvious), however my fix for both is a single-line patch, which is why I'm combining them in a single mail, rather than writing two mails+patches. ===== (A) MSG_COPY + MSG_EXCEPT ===== With the addition of the MSG_COPY flag, there are now two msgrcv() flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp' argument in unrelated ways. Specifying both in the same call is a logical error that is currently permitted, with the effect that MSG_COPY has priority and MSG_EXCEPT is ignored. The call should give an error if both flags are specified. The patch below implements that behavior. ===== (B) (B) MSG_COPY + !IPC_NOWAIT ===== The test code that was submitted in commit 3a665531a3b7 ("selftests: IPC message queue copy feature test") shows MSG_COPY being used in conjunction with IPC_NOWAIT. In other words, if there is no message at the position 'msgtyp'. return immediately with the error in ENOMSG. What was not (fully) tested is the behavior if MSG_COPY is specified *without* IPC_NOWAIT, and there is an odd behavior. If the queue contains less than 'msgtyp' messages, then the call blocks until the next message is written to the queue. At that point, the msgrcv() call returns a copy of the newly added message, regardless of whether that message is at the ordinal position 'msgtyp'. This is clearly bogus, and problematic for applications that might want to make use of the MSG_COPY flag. I considered the following possible solutions to this problem: (1) Force the call to block until a message *does* appear at the position 'msgtyp'. (2) If the MSG_COPY flag is specified, the kernel should implicitly add IPC_NOWAIT, so that the call fails with ENOMSG for this case. (3) If the MSG_COPY flag is specified, but IPC_NOWAIT is not, generate an error (probably, EINVAL is the right one). I do not know if any application would really want to have the functionality of solution (1), especially since an application can determine in advance the number of messages in the queue using msgctl() IPC_STAT. Obviously, this solution would be the most work to implement. Solution (2) would have the effect of silently fixing any applications that tried to employ broken behavior. However, it would mean that if we later decided to implement solution (1), then user-space could not easily detect what the kernel supports (but, since I'm somewhat doubtful that solution (1) is needed, I'm not sure that this is much of a problem). Solution (3) would have the effect of informing broken applications that they are doing something broken. The downside is that this would cause a ABI breakage for any applications that are currently employing the broken behavior. However: a) Those applications are almost certainly not getting the results they expect. b) Possibly, those applications don't even exist, because MSG_COPY is currently hidden behind CONFIG_CHECKPOINT_RESTORE. The upside of solution (3) is that if we later decided to implement solution (1), user-space could determine what the kernel supports, via the error return. In my view, solution (3) is mildly preferable to solution (2), and solution (1) could still be done later if anyone really cares. The patch below implements solution (3). PS. For anyone out there still listening, it's the usual story: documenting an API (and the thinking about, and the testing of the API, that documentation entails) is the one of the single best ways of finding bugs in the API, as I've learned from a lot of experience. Best to do that documentation before releasing the API. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc,msg: prevent race with rmid in msgsnd,msgrcvDavidlohr Bueso1-0/+13
commit 4271b05a227dc6175b66c3d9941aeab09048aeb2 upstream. This fixes a race in both msgrcv() and msgsnd() between finding the msg and actually dealing with the queue, as another thread can delete shmid underneath us if we are preempted before acquiring the kern_ipc_perm.lock. Manfred illustrates this nicely: Assume a preemptible kernel that is preempted just after msq = msq_obtain_object_check(ns, msqid) in do_msgrcv(). The only lock that is held is rcu_read_lock(). Now the other thread processes IPC_RMID. When the first task is resumed, then it will happily wait for messages on a deleted queue. Fix this by checking for if the queue has been deleted after taking the lock. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reported-by: Manfred Spraul <manfred@colorfullife.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc: fix race with LSMsDavidlohr Bueso1-6/+13
commit 53dad6d3a8e5ac1af8bacc6ac2134ae1a8b085f1 upstream. Currently, IPC mechanisms do security and auditing related checks under RCU. However, since security modules can free the security structure, for example, through selinux_[sem,msg_queue,shm]_free_security(), we can race if the structure is freed before other tasks are done with it, creating a use-after-free condition. Manfred illustrates this nicely, for instance with shared mem and selinux: -> do_shmat calls rcu_read_lock() -> do_shmat calls shm_object_check(). Checks that the object is still valid - but doesn't acquire any locks. Then it returns. -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat) -> selinux_shm_shmat calls ipc_has_perm() -> ipc_has_perm accesses ipc_perms->security shm_close() -> shm_close acquires rw_mutex & shm_lock -> shm_close calls shm_destroy -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security) -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm) -> ipc_free_security calls kfree(ipc_perms->security) This patch delays the freeing of the security structures after all RCU readers are done. Furthermore it aligns the security life cycle with that of the rest of IPC - freeing them based on the reference counter. For situations where we need not free security, the current behavior is kept. Linus states: "... the old behavior was suspect for another reason too: having the security blob go away from under a user sounds like it could cause various other problems anyway, so I think the old code was at least _prone_ to bugs even if it didn't have catastrophic behavior." I have tested this patch with IPC testcases from LTP on both my quad-core laptop and on a 64 core NUMA server. In both cases selinux is enabled, and tests pass for both voluntary and forced preemption models. While the mentioned races are theoretical (at least no one as reported them), I wanted to make sure that this new logic doesn't break anything we weren't aware of. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc,msg: drop msg_unlockDavidlohr Bueso1-3/+2
commit 4718787d1f626f45ddb239912bc07266b9880044 upstream. There is only one user left, drop this function and just call ipc_unlock_object() and rcu_read_unlock(). Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc: rename ids->rw_mutexDavidlohr Bueso1-10/+10
commit d9a605e40b1376eb02b067d7690580255a0df68f upstream. Since in some situations the lock can be shared for readers, we shouldn't be calling it a mutex, rename it to rwsem. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc/msg.c: Fix lost wakeup in msgsnd().Manfred Spraul1-7/+5
commit bebcb928c820d0ee83aca4b192adc195e43e66a2 upstream. The check if the queue is full and adding current to the wait queue of pending msgsnd() operations (ss_add()) must be atomic. Otherwise: - the thread that performs msgsnd() finds a full queue and decides to sleep. - the thread that performs msgrcv() first reads all messages from the queue and then sleeps, because the queue is empty. - the msgrcv() calls do not perform any wakeups, because the msgsnd() task has not yet called ss_add(). - then the msgsnd()-thread first calls ss_add() and then sleeps. Net result: msgsnd() and msgrcv() both sleep forever. Observed with msgctl08 from ltp with a preemptible kernel. Fix: Call ipc_lock_object() before performing the check. The patch also moves security_msg_queue_msgsnd() under ipc_lock_object: - msgctl(IPC_SET) explicitely mentions that it tries to expunge any pending operations that are not allowed anymore with the new permissions. If security_msg_queue_msgsnd() is called without locks, then there might be races. - it makes the patch much simpler. Reported-and-tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc: remove unused functionsDavidlohr Bueso1-25/+0
commit 9ad66ae65fc8d3e7e3344310fb0aa835910264fe upstream. We can now drop the msg_lock and msg_lock_check functions along with a bogus comment introduced previously in semctl_down. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc,msg: shorten critical region in msgrcvDavidlohr Bueso1-26/+32
commit 41a0d523d0f626e9da0dc01de47f1b89058033cf upstream. do_msgrcv() is the last msg queue function that abuses the ipc lock Take it only when needed when actually updating msq. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc,msg: shorten critical region in msgsndDavidlohr Bueso1-13/+24
commit 3dd1f784ed6603d7ab1043e51e6371235edf2313 upstream. do_msgsnd() is another function that does too many things with the ipc object lock acquired. Take it only when needed when actually updating msq. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc,msg: make msgctl_nolock locklessDavidlohr Bueso1-10/+17
commit ac0ba20ea6f2201a1589d6dc26ad1a4f0f967bb8 upstream. While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire it unnecessarily. We can do the permissions and security checks only holding the rcu lock. This function now mimics semctl_nolock(). Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc,msg: introduce lockless functions to obtain the ipc objectDavidlohr Bueso1-0/+21
commit a5001a0d9768568de5d613c3b3a5b9c7721299da upstream. Add msq_obtain_object() and msq_obtain_object_check(), which will allow us to get the ipc object without acquiring the lock. Just as with semaphores, these functions are basically wrappers around ipc_obtain_object*(). Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc,msg: introduce msgctl_nolockDavidlohr Bueso1-15/+34
commit 2cafed30f150f7314f98717b372df8173516cae0 upstream. Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands can be performed without acquiring the ipc object. Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT out of msgctl(). This change still takes the lock and it will be properly lockless in the next patch Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc,msg: shorten critical region in msgctl_downDavidlohr Bueso1-5/+7
commit 15724ecb7e9bab35fc694c666ad563adba820cc3 upstream. Instead of holding the ipc lock for the entire function, use the ipcctl_pre_down_nolock and only acquire the lock for specific commands: RMID and SET. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc: move locking out of ipcctl_pre_down_nolockDavidlohr Bueso1-7/+17
commit 7b4cc5d8411bd4e9d61d8714f53859740cf830c2 upstream. This function currently acquires both the rw_mutex and the rcu lock on successful lookups, leaving the callers to explicitly unlock them, creating another two level locking situation. Make the callers (including those that still use ipcctl_pre_down()) explicitly lock and unlock the rwsem and rcu lock. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc: close open coded spin lock callsDavidlohr Bueso1-1/+1
commit cf9d5d78d05bca96df7618dfc3a5ee4414dcae58 upstream. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ipc: move rcu lock out of ipc_addidDavidlohr Bueso1-4/+3
commit dbfcd91f06f0e2d5564b2fd184e9c2a43675f9ab upstream. This patchset continues the work that began in the sysv ipc semaphore scaling series, see https://lkml.org/lkml/2013/3/20/546 Just like semaphores used to be, sysv shared memory and msg queues also abuse the ipc lock, unnecessarily holding it for operations such as permission and security checks. This patchset mostly deals with mqueues, and while shared mem can be done in a very similar way, I want to get these patches out in the open first. It also does some pending cleanups, mostly focused on the two level locking we have in ipc code, taking care of ipc_addid() and ipcctl_pre_down_nolock() - yes there are still functions that need to be updated as well. This patch: Make all callers explicitly take and release the RCU read lock. This addresses the two level locking seen in newary(), newseg() and newqueue(). For the last two, explicitly unlock the ipc object and the rcu lock, instead of calling the custom shm_unlock and msg_unlock functions. The next patch will deal with the open coded locking for ->perm.lock Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-07IPC: bugfix for msgrcv with msgtyp < 0Svenning Soerensen1-2/+3
commit 368ae537e056acd3f751fa276f48423f06803922 upstream. According to 'man msgrcv': "If msgtyp is less than 0, the first message of the lowest type that is less than or equal to the absolute value of msgtyp shall be received." Bug: The kernel only returns a message if its type is 1; other messages with type < abs(msgtype) will never get returned. Fix: After having traversed the list to find the first message with the lowest type, we need to actually return that message. This regression was introduced by commit daaf74cf0867 ("ipc: refactor msg list search into separate function") Signed-off-by: Svenning Soerensen <sss@secomea.dk> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-01ipc/msg.c: use list_for_each_entry_[safe] for list traversingNikola Pajkovsky1-27/+8
The ipc/msg.c code does its list operations by hand and it open-codes the accesses, instead of using for_each_entry_[safe]. Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc,sem: fine grained locking for semtimedopRik van Riel1-1/+6
Introduce finer grained locking for semtimedop, to handle the common case of a program wanting to manipulate one semaphore from an array with multiple semaphores. If the call is a semop manipulating just one semaphore in an array with multiple semaphores, only take the lock for that semaphore itself. If the call needs to manipulate multiple semaphores, or another caller is in a transaction that manipulates multiple semaphores, the sem_array lock is taken, as well as all the locks for the individual semaphores. On a 24 CPU system, performance numbers with the semop-multi test with N threads and N semaphores, look like this: vanilla Davidlohr's Davidlohr's + Davidlohr's + threads patches rwlock patches v3 patches 10 610652 726325 1783589 2142206 20 341570 365699 1520453 1977878 30 288102 307037 1498167 2037995 40 290714 305955 1612665 2256484 50 288620 312890 1733453 2650292 60 289987 306043 1649360 2388008 70 291298 306347 1723167 2717486 80 290948 305662 1729545 2763582 90 290996 306680 1736021 2757524 100 292243 306700 1773700 3059159 [davidlohr.bueso@hp.com: do not call sem_lock when bogus sma] [davidlohr.bueso@hp.com: make refcounter atomic] Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Jason Low <jason.low2@hp.com> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Emmanuel Benisty <benisty.e@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc: refactor msg list search into separate functionPeter Hurley1-22/+26
[fengguang.wu@intel.com: find_msg can be static] Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc: simplify msg list searchPeter Hurley1-6/+2
Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc: implement MSG_COPY as a new receive modePeter Hurley1-13/+9
Teach the helper routines about MSG_COPY so that msgtyp is preserved as the message number to copy. The security functions affected by this change were audited and no additional changes are necessary. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01ipc: remove msg handling from queue scanPeter Hurley1-10/+4
In preparation for refactoring the queue scan into a separate function, relocate msg copying. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-02ipc: set msg back to -EAGAIN if copy wasn't performedStanislav Kinsbursky1-0/+1
Make sure that msg pointer is set back to error value in case of MSG_COPY flag is set and desired message to copy wasn't found. This garantees that msg is either a error pointer or a copy address. Otherwise the last message in queue will be freed without unlinking from the queue (which leads to memory corruption) and the dummy allocated copy won't be released. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-08ipc: don't allocate a copy larger than maxPeter Hurley1-2/+4
When MSG_COPY is set, a duplicate message must be allocated for the copy before locking the queue. However, the copy could not be larger than was sent which is limited to msg_ctlmax. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: add more comments to message copying related codeStanislav Kinsbursky1-0/+8
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: simplify message copyingStanislav Kinsbursky1-16/+9
Remove the redundant and confusing fill_copy(). Also add copy_msg() check for error. In this case exit from the function have to be done instead of break, because further code interprets any error as EAGAIN. Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is disabled. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: convert prepare_copy() from macro to functionStanislav Kinsbursky1-2/+9
This code works if CONFIG_CHECKPOINT_RESTORE is disabled. [akpm@linux-foundation.org: remove __maybe_unused] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: simplify free_copy() callStanislav Kinsbursky1-6/+8
Passing and checking of msgflg to free_copy() is redundant. This patch sets copy to NULL on declaration instead and checks for non-NULL in free_copy(). Note: in case of copy allocation failure, error is returned immediately. So no need to check for IS_ERR() in free_copy(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: introduce message queue copy featureStanislav Kinsbursky1-2/+62
This patch is required for checkpoint/restore in userspace. c/r requires some way to get all pending IPC messages without deleting them from the queue (checkpoint can fail and in this case tasks will be resumed, so queue have to be valid). To achive this, new operation flag MSG_COPY for sys_msgrcv() system call was introduced. If this flag was specified, then mtype is interpreted as number of the message to copy. If MSG_COPY is set, then kernel will allocate dummy message with passed size, and then use new copy_msg() helper function to copy desired message (instead of unlinking it from the queue). Notes: 1) Return -ENOSYS if MSG_COPY is specified, but CONFIG_CHECKPOINT_RESTORE is not set. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: message queue receive cleanupStanislav Kinsbursky1-21/+23
Move all message related manipulation into one function msg_fill(). Actually, two functions because of the compat one. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04ipc: remove forced assignment of selected messageStanislav Kinsbursky1-4/+1
This is a cleanup patch. The assignment is redundant. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-06userns: Convert ipc to use kuid and kgid where appropriateEric W. Biederman1-5/+9
- Store the ipc owner and creator with a kuid - Store the ipc group and the crators group with a kgid. - Add error handling to ipc_update_perms, allowing it to fail if the uids and gids can not be converted to kuids or kgids. - Modify the proc files to display the ipc creator and owner in the user namespace of the opener of the proc file. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-03-31Fix common misspellingsLucas De Marchi1-2/+2
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-23userns: user namespaces: convert several capable() callsSerge E. Hallyn1-4/+4
CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(), because the resource comes from current's own ipc namespace. setuid/setgid are to uids in own namespace, so again checks can be against current_user_ns(). Changelog: Jan 11: Use task_ns_capable() in place of sched_capable(). Jan 11: Use nsown_capable() as suggested by Bastian Blank. Jan 11: Clarify (hopefully) some logic in futex and sched.c Feb 15: use ns_capable for ipc, not nsown_capable Feb 23: let copy_ipcs handle setting ipc_ns->user_ns Feb 23: pass ns down rather than taking it from current [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, ↵Alexey Dobriyan1-6/+6
SHRT_MAX and SHRT_MIN - C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not USHORT_MAX/SHORT_MAX/SHORT_MIN. - Make SHRT_MIN of type s16, not int, for consistency. [akpm@linux-foundation.org: fix drivers/dma/timb_dma.c] [akpm@linux-foundation.org: fix security/keys/keyring.c] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-1/+0
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-12-16ipc ns: fix memory leak (idr)Serge E. Hallyn1-0/+1
We have apparently had a memory leak since 7ca7e564e049d8b350ec9d958ff25eaa24226352 "ipc: store ipcs into IDRs" in 2007. The idr of which 3 exist for each ipc namespace is never freed. This patch simply frees them when the ipcns is freed. I don't believe any idr_remove() are done from rcu (and could therefore be delayed until after this idr_destroy()), so the patch should be safe. Some quick testing showed no harm, and the memory leak fixed. Caught by kmemleak. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-04ipc: fix unused variable warningFelipe Contreras1-1/+1
Commit a0d092f introduced the following warning: ipc/msg.c: In function ?msgctl_down?: ipc/msg.c:415: warning: ?msqid64? may be used uninitialized in this function The gcc warning in this case is actually bogus, as msqid64 is touched only iff cmd == IPC_SET, and in such case, copy_msqid_from_user() initializes it properly. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-14[CVE-2009-0029] System call wrappers part 24Heiko Carstens1-6/+6
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2008-06-06ipc: only output msgmni value at boot timeNadia Derbey1-7/+6
When posting: [PATCH 1/8] Scaling msgmni to the amount of lowmem (see http://lkml.org/lkml/2008/2/11/171), I have added a KERN_INFO message that is output each time msgmni is recomputed. In http://lkml.org/lkml/2008/4/29/575 Tony Luck complained that this message references an ipc namespace address that is useless. I first thought of using an audit_log instead of a printk, as suggested by Serge Hallyn. But unfortunately, we do not have any other information than the namespace address to provide here too. So I chose to move the message and output it only at boot time, removing the reference to the namespace. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ipc: add definitions of USHORT_MAX and othersZhang, Yanmin1-6/+6
Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub implementation might also use it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: "Pierre Peiffer" <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29IPC: consolidate all xxxctl_down() functionsPierre Peiffer1-43/+5
semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set of commands for each kind of IPC. They all start to do the same job (they retrieve the ipc and do some permission checks) before handling the commands on their own. This patch proposes to consolidate this by moving these same pieces of code into one common function called ipcctl_pre_down(). It simplifies a little these xxxctl_down() functions and increases a little the maintainability. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29IPC: introduce ipc_update_perm()Pierre Peiffer1-4/+1
The IPC_SET command performs the same permission setting for all IPCs. This patch introduces a common ipc_update_perm() function to update these permissions and makes use of it for all IPCs. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29IPC: get rid of the use *_setbuf structure.Pierre Peiffer1-33/+18
All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET command. This is not really needed and, moreover, it complicates a little bit the code. This patch gets rid of the use of it and uses directly the semid64_ds/ msgid64_ds/shmid64_ds structure. In addition of removing one struture declaration, it also simplifies and improves a little bit the common 64-bits path. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29IPC/message queues: introduce msgctl_downPierre Peiffer1-73/+89
Currently, sys_msgctl is not easy to read. This patch tries to improve that by introducing the msgctl_down function to handle all commands requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down for message queues. This greatly changes the readability of sys_msgctl and also harmonizes the way these commands are handled among all IPCs. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ipc: recompute msgmni on memory add / removeNadia Derbey1-1/+1
Introduce the registration of a callback routine that recomputes msg_ctlmni upon memory add / remove. A single notifier block is registered in the hotplug memory chain for all the ipc namespaces. Since the ipc namespaces are not linked together, they have their own notification chain: one notifier_block is defined per ipc namespace. Each time an ipc namespace is created (removed) it registers (unregisters) its notifier block in (from) the ipcns chain. The callback routine registered in the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event. Each callback routine registered in the ipcns namespace, in turn, recomputes msgmni for the owning namespace. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ipc: scale msgmni to the number of ipc namespacesNadia Derbey1-3/+7
Since all the namespaces see the same amount of memory (the total one) this patch introduces a new variable that counts the ipc namespaces and divides msg_ctlmni by this counter. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ipc: scale msgmni to the amount of lowmemNadia Derbey1-1/+36
On large systems we'd like to allow a larger number of message queues. In some cases up to 32K. However simply setting MSGMNI to a larger value may cause problems for smaller systems. The first patch of this series introduces a default maximum number of message queue ids that scales with the amount of lowmem. Since msgmni is per namespace and there is no amount of memory dedicated to each namespace so far, the second patch of this series scales msgmni to the number of ipc namespaces too. Since msgmni depends on the amount of memory, it becomes necessary to recompute it upon memory add/remove. In the 4th patch, memory hotplug management is added: a notifier block is registered into the memory hotplug notifier chain for the ipc subsystem. Since the ipc namespaces are not linked together, they have their own notification chain: one notifier_block is defined per ipc namespace. Each time an ipc namespace is created (removed) it registers (unregisters) its notifier block in (from) the ipcns chain. The callback routine registered in the memory chain invokes the ipcns notifier chain with the IPCNS_MEMCHANGE event. Each callback routine registered in the ipcns namespace, in turn, recomputes msgmni for the owning namespace. The 5th patch makes it possible to keep the memory hotplug notifier chain's lock for a lesser amount of time: instead of directly notifying the ipcns notifier chain upon memory add/remove, a work item is added to the global workqueue. When activated, this work item is the one who notifies the ipcns notifier chain. Since msgmni depends on the number of ipc namespaces, it becomes necessary to recompute it upon ipc namespace creation / removal. The 6th patch uses the ipc namespace notifier chain for that purpose: that chain is notified each time an ipc namespace is created or removed. This makes it possible to recompute msgmni for all the namespaces each time one of them is created or removed. When msgmni is explicitely set from userspace, we should avoid recomputing it upon memory add/remove or ipcns creation/removal. This is what the 7th patch does: it simply unregisters the ipcns callback routine as soon as msgmni has been changed from procfs or sysctl(). Even if msgmni is set by hand, it should be possible to make it back automatically recomputed upon memory add/remove or ipcns creation/removal. This what is achieved in patch 8: if set to a negative value, msgmni is added back to the ipcns notifier chain, making it automatically recomputed again. This patch: Compute msg_ctlmni to make it scale with the amount of lowmem. msg_ctlmni is now set to make the message queues occupy 1/32 of the available lowmem. Some cleaning has also been done for the MSGPOOL constant: the msgctl man page says it's not used, but it also defines it as a size in bytes (the code expresses it in Kbytes). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29IPC: use ipc_buildid() directly from ipc_addid()Pierre Peiffer1-2/+0
By continuing to consolidate a little the IPC code, each id can be built directly in ipc_addid() instead of having it built from each callers of ipc_addid() And I also remove shm_addid() in order to have, as much as possible, the same code for shm/sem/msg. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>