summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-11-21fs: dlm: parallelize lowcomms socket handlingAlexander Aring3-484/+586
This patch is rework of lowcomms handling, the main goal was here to handle recvmsg() and sendpage() to run parallel. Parallel in two senses: 1. per connection and 2. that recvmsg()/sendpage() doesn't block each other. Currently recvmsg()/sendpage() cannot run parallel because two workqueues "dlm_recv" and "dlm_send" are ordered workqueues. That means only one work item can be executed. The amount of queue items will be increased about the amount of nodes being inside the cluster. The current two workqueues for sending and receiving can also block each other if the same connection is executed at the same time in dlm_recv and dlm_send workqueue because a per connection mutex for the socket handling. To make it more parallel we introduce one "dlm_io" workqueue which is not an ordered workqueue, the amount of workers are not limited. Due per connection flags SEND/RECV pending we schedule workers ordered per connection and per send and receive task. To get rid of the mutex blocking same workers to do socket handling we switched to a semaphore which handles socket operations as read lock and sock releases as write operations, to prevent sock_release() being called while the socket is being used. There might be more optimization removing the semaphore and replacing it with other synchronization mechanism, however due other circumstances e.g. othercon behaviour it seems complicated to doing this change. I added comments to remove the othercon handling and moving to a different synchronization mechanism as this is done. We need to do that to the next dlm major version upgrade because it is not backwards compatible with the current connect mechanism. The processing of dlm messages need to be still handled by a ordered workqueue. An dlm_process ordered workqueue was introduced which gets filled by the receive worker. This is probably the next bottleneck of DLM but the application can't currently parse dlm messages parallel. A comment was introduced to lift the workqueue context of dlm processing in a non-sleepable softirq to get messages processing done fast. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: don't init error valueAlexander Aring1-1/+1
This patch removes a init of an error value to -EINVAL which is not necessary. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: use saved sk_error_report()Alexander Aring1-5/+1
This patch changes the handling of calling the original sk_error_report() by not putting it on the stack and calling it later. If the listen_sock.sk_error_report() is NULL in this moment it indicates a bug in our implementation. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: use sock2con without checking nullAlexander Aring1-13/+4
This patch removes null checks on private data for sockets. If we have a null dereference there we having a bug in our implementation that such callback occurs in this state. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: remove dlm_node_addrs lookup listAlexander Aring1-154/+136
This patch merges the dlm_node_addrs lookup list to the connection structure. It is a per node mapping to some configuration setup by configfs. We don't need two lookup structures. The connection hash has now a lifetime like the dlm_node_addrs entries. Means we add only new entries when configure cluster and not while new connections are coming in, remove connection when a node got fenced and cleanup all connection when the dlm exits. It should work the same and even will show more issues because we don't try to somehow keep those two data structures in sync with the current cluster configuration. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: don't put dlm_local_addrs on heapAlexander Aring1-26/+12
This patch removes to allocate the dlm_local_addr[] pointers on the heap. Instead we directly store the type of "struct sockaddr_storage". This removes function deinit_local() because it was freeing memory only. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: cleanup listen sock handlingAlexander Aring1-34/+17
This patch removes save_listen_callbacks() and add_listen_sock() as they are only used once in lowcomms functionality. For shutdown lowcomms it's not necessary to whole flush the workqueues to synchronize with restoring the old sk_data_ready() callback. Only the listen con receive work need to be cancelled. For each individual node shutdown we should be sure that last ack was been transmitted which is done by flushing per connection swork worker. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: remove socket shutdown handlingAlexander Aring3-107/+27
Since commit 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") we have functionality like TCP offers for half-closed sockets on dlm application protocol layer. This feature is required because the cluster manager events about leaving resource memberships can be locally already occurred but other cluster nodes having a pending leaving membership over the cluster manager protocol happening. In this time the local dlm node already shutdown it's connection and don't transmit anymore any new dlm messages, but however it still needs to be able to accept dlm messages because the pending leave membership request of the cluster manager protocol which the dlm kernel implementation has no control about it. We have this functionality on the application for two reasons, the main reason is that SCTP does not support such functionality on socket layer. But we can do it inside application layer. Another small issue is that this feature is broken in the TCP world because some NAT devices does not implement such functionality correctly. This is the same reason why the reliable connection session layer in DLM exists. We give up on middle devices in the networking which sends e.g. TCP resets out. In DLM we cannot have any message dropping and we ensure it over a session layer that it can't happen. Back to the half-closed grace shutdown handling. It's not necessary anymore to do it on socket layer (which is only support for TCP sockets) because we do it on application layer. This patch removes this handling, if there are still issues then we have a problem on the application layer for such handling. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: use listen sock as dlm running indicatorAlexander Aring3-15/+10
This patch will switch from dlm_allow_conn to check if dlm lowcomms is running or not to if we actually have a listen socket set or not. The list socket will be set and unset in lowcomms start and shutdown functionality. To synchronize with data_ready() callback we will set the socket callback to NULL while socket lock is held. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: use list_first_entry_or_nullAlexander Aring1-6/+3
Instead of check on list_empty() we can do the same with list_first_entry_or_null() and return NULL if the returned value is NULL. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: remove twice INIT_WORKAlexander Aring1-1/+0
This patch removed a twice INIT_WORK() functionality. We already doing this inside of dlm_lowcomms_init() functionality which is called only once dlm is loaded. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: add midcomms init/start functionsAlexander Aring6-12/+37
This patch introduces leftovers of init, start, stop and exit functionality. The dlm application layer should always call the midcomms layer which getting aware of such event and redirect it to the lowcomms layer. Some functionality which is currently handled inside the start functionality of midcomms and lowcomms should be handled in the init functionality as it only need to be initialized once when dlm is loaded. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: add dst nodeid for msg tracingAlexander Aring1-4/+6
In DLM when we send a dlm message it is easy to add the lock resource name, but additional lookup is required when to trace the receive message side. The idea here is to move the lookup work to the user by using a lookup to find the right send message with recv message. As note DLM can't drop any message which is guaranteed by a special session layer. For doing the lookup a 3 tupel is required as an unique identification which is dst nodeid, src nodeid and sequence number. This patch adds the destination nodeid to the dlm message trace points. The source nodeid is given by the h_nodeid field inside the header. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: rename DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDINGAlexander Aring3-8/+6
This patch renames DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING because CB_PENDING is a proper name to describe this flag. This flag is set when callback enqueue will return DLM_ENQUEUE_CALLBACK_NEED_SCHED because the callback worker need to be queued. The flag tells that callbacks are currently pending to be called and will be unset if the callback work for the specific lkb is done. The term need schedule is part of this time but a proper name is to say that there are some callbacks pending to being called. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: ast do WARN_ON_ONCE() on hotpathAlexander Aring2-7/+7
This patch changes the ast hotpath functionality in very unlikely cases that we do WARN_ON_ONCE() instead of WARN_ON() to not spamming the console output if we run into states that it would occur over and over again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: drop lkb ref in bug caseAlexander Aring1-1/+2
This patch will drop the lkb reference in an very unlikely case which should in practice not happened. However if it happens we cleanup the reference just in case. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-21fs: dlm: avoid false-positive checker warningAlexander Aring1-1/+2
This patch avoid the false-positive checker warning about writing 112 bytes into a 88 bytes field "e->request", see: [ 54.891560] dlm: csmb1: dlm_recover_directory 23 out 2 messages [ 54.990542] ------------[ cut here ]------------ [ 54.991012] memcpy: detected field-spanning write (size 112) of single field "&e->request" at fs/dlm/requestqueue.c:47 (size 88) [ 54.992150] WARNING: CPU: 0 PID: 297 at fs/dlm/requestqueue.c:47 dlm_add_requestqueue+0x177/0x180 [ 54.993002] CPU: 0 PID: 297 Comm: kworker/u4:3 Not tainted 6.1.0-rc5-00008-ge01d50cbd6ee #248 [ 54.993878] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 54.994718] Workqueue: dlm_recv process_recv_sockets [ 54.995230] RIP: 0010:dlm_add_requestqueue+0x177/0x180 [ 54.995731] Code: e7 01 0f 85 3b ff ff ff b9 58 00 00 00 48 c7 c2 c0 41 74 82 4c 89 ee 48 c7 c7 20 42 74 82 c6 05 8b 8d 30 02 01 e8 51 07 be 00 <0f> 0b e9 12 ff ff ff 66 90 0f 1f 44 00 00 41 57 48 8d 87 10 08 00 [ 54.997483] RSP: 0018:ffffc90000b1fbe8 EFLAGS: 00010282 [ 54.997990] RAX: 0000000000000000 RBX: ffff888024fc3d00 RCX: 0000000000000000 [ 54.998667] RDX: 0000000000000001 RSI: ffffffff81155014 RDI: fffff52000163f73 [ 54.999342] RBP: ffff88800dbac000 R08: 0000000000000001 R09: ffffc90000b1fa5f [ 54.999997] R10: fffff52000163f4b R11: 203a7970636d656d R12: ffff88800cfb0018 [ 55.000673] R13: 0000000000000070 R14: ffff888024fc3d18 R15: 0000000000000000 [ 55.001344] FS: 0000000000000000(0000) GS:ffff88806d600000(0000) knlGS:0000000000000000 [ 55.002078] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.002603] CR2: 00007f35d4f0b9a0 CR3: 0000000025495002 CR4: 0000000000770ef0 [ 55.003258] PKRU: 55555554 [ 55.003514] Call Trace: [ 55.003756] <TASK> [ 55.003953] dlm_receive_buffer+0x1c0/0x200 [ 55.004348] dlm_process_incoming_buffer+0x46d/0x780 [ 55.004786] ? kernel_recvmsg+0x8b/0xc0 [ 55.005150] receive_from_sock.isra.0+0x168/0x420 [ 55.005582] ? process_listen_recv_socket+0x10/0x10 [ 55.006018] ? finish_task_switch.isra.0+0xe0/0x400 [ 55.006469] ? __switch_to+0x2fe/0x6a0 [ 55.006808] ? read_word_at_a_time+0xe/0x20 [ 55.007197] ? strscpy+0x146/0x190 [ 55.007505] process_one_work+0x3d0/0x6b0 [ 55.007863] worker_thread+0x8d/0x620 [ 55.008209] ? __kthread_parkme+0xd8/0xf0 [ 55.008565] ? process_one_work+0x6b0/0x6b0 [ 55.008937] kthread+0x171/0x1a0 [ 55.009251] ? kthread_exit+0x60/0x60 [ 55.009582] ret_from_fork+0x1f/0x30 [ 55.009903] </TASK> [ 55.010120] ---[ end trace 0000000000000000 ]--- [ 55.025783] dlm: csmb1: dlm_recover 5 generation 3 done: 201 ms [ 55.026466] gfs2: fsid=smbcluster:csmb1.0: recover generation 3 done It seems the checker is unable to detect the additional length bytes which was allocated additionally for the flexible array in struct dlm_message. To solve it we split the memcpy() into copy for the 88 bytes struct and another memcpy() for the flexible array m_extra field. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: use WARN_ON_ONCE() instead of WARN_ON()Alexander Aring1-9/+9
To not get the console spammed about WARN_ON() of invalid states in the dlm midcomms hot path handling we switch to WARN_ON_ONCE() to get it only once that there might be an issue with the midcomms state handling. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: fix log of lowcomms vs midcommsAlexander Aring1-1/+1
This patch will fix a small issue when printing out that dlm_midcomms_start() failed to start and it was printing out that the dlm subcomponent lowcomms was failed but lowcomms is behind the midcomms layer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: catch dlm_add_member() errorAlexander Aring1-1/+4
This patch will catch a possible dlm_add_member() and delivers it to the dlm recovery handling. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: relax sending to allow receivingAlexander Aring1-5/+10
This patch drops additionally the sock_mutex when there is a sending message burst. Since we have acknowledge handling we free sending buffers only when we receive an ack back, but if we are stuck in send_to_sock() looping because dlm sends a lot of messages and we never leave the loop the sending buffer fill up very quickly. We can't receive during this iteration because the sock_mutex is held. This patch will unlock the sock_mutex so it should be possible to receive messages when a burst of sending messages happens. This will allow to free up memory because acks which are already received can be processed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: remove ls_remove_wait waitqueueAlexander Aring3-61/+2
This patch removes the ls_remove_wait waitqueue handling. The current handling tries to wait before a lookup is send out for a identically resource name which is going to be removed. Hereby the remove message should be send out before the new lookup message. The reason is that after a lookup request and response will actually use the specific remote rsb. A followed remove message would delete the rsb on the remote side but it's still being used. To reach a similar behaviour we simple send the remove message out while the rsb lookup lock is held and the rsb is removed from the toss list. Other find_rsb() calls would never have the change to get a rsb back to live while a remove message will be send out (without holding the lock). This behaviour requires a non-sleepable context which should be provided now and might be the reason why it was not implemented so in the first place. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: allow different allocation context per _create_messageAlexander Aring4-16/+23
This patch allows to give the use control about the allocation context based on a per message basis. Currently all messages forced to be created under GFP_NOFS context. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: use a non-static queue for callbacksAlexander Aring9-217/+222
This patch will introducde a queue implementation for callbacks by using the Linux lists. The current callback queue handling is implemented by a static limit of 6 entries, see DLM_CALLBACKS_SIZE. The sequence number inside the callback structure was used to see if the entries inside the static entry is valid or not. We don't need any sequence numbers anymore with a dynamic datastructure with grows and shrinks during runtime to offer such functionality. We assume that every callback will be delivered to the DLM user if once queued. Therefore the callback flag DLM_CB_SKIP was dropped and the check for skipping bast was moved before worker handling and not skip while the callback worker executes. This will reduce unnecessary queues of the callback worker. All last callback saves are pointers now and don't need to copied over. There is a reference counter for callback structures which will care about to free the callback structures at the right time if they are not referenced anymore. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: move last cast bast time to function callAlexander Aring1-6/+4
This patch moves the debugging information of the last cast and bast time when calling the last and bast function call. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: use spin lock instead of mutexAlexander Aring3-6/+6
There is no need to use a mutex in those hot path sections. We change it to spin lock to serve callbacks more faster by not allowing schedule. The locked sections will not be locked for a long time. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: convert ls_cb_mutex mutex to spinlockAlexander Aring3-8/+8
This patch converts the ls_cb_mutex mutex to a spinlock, there is no sleepable context when this lock is held. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: use list_first_entry marcoAlexander Aring1-1/+1
Instead of using list_entry() this patch moves to using the list_first_entry() macro. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: let dlm_add_cb queue work after resume onlyAlexander Aring1-2/+2
We should allow dlm_add_cb() to call queue_work() only after the recovery queued pending for delayed lkbs. This patch will move the switch LSFL_CB_DELAY after the delayed lkb work was processed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fd: dlm: trace send/recv of dlm message and rcomAlexander Aring4-17/+56
This patch adds tracepoints for send and recv cases of dlm messages and dlm rcom messages. In case of send and dlm message we add the dlm rsb resource name this dlm messages belongs to. This has the advantage to follow dlm messages on a per lock basis. In case of recv message the resource name can be extracted by follow the send message sequence number. The dlm message DLM_MSG_PURGE doesn't belong to a lock request and will not set the resource name in a dlm_message trace. The same for all rcom messages. There is additional handling required for this debugging functionality which is tried to be small as possible. Also the midcomms layer gets aware of lock resource names, for now this is required to make a connection between sequence number and lock resource names. It is for debugging purpose only. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: use packet in dlm_mhandleAlexander Aring1-3/+3
To allow more than just dereferencing the inner header we directly point to the inner dlm packet which allows us to dereference the header, rcom or message structure. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: remove send repeat remove handlingAlexander Aring1-74/+0
This patch removes the send repeat remove handling. This handling is there to repeatingly DLM_MSG_REMOVE messages in cases the dlm stack thinks it was not received at the first time. In cases of message drops this functionality is necessary, but since the DLM midcomms layer guarantees there are no messages drops between cluster nodes this feature became not strict necessary anymore. Due message delays/processing it could be that two send_repeat_remove() are sent out while the other should be still on it's way. We remove the repeat remove handling because we are sure that the message cannot be dropped due communication errors. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: retry accept() until -EAGAIN or error returnsAlexander Aring1-1/+5
This patch fixes a race if we get two times an socket data ready event while the listen connection worker is queued. Currently it will be served only once but we need to do it (in this case twice) until we hit -EAGAIN which tells us there is no pending accept going on. This patch wraps an do while loop until we receive a return value which is different than 0 as it was done before commit d11ccd451b65 ("fs: dlm: listen socket out of connection hash"). Cc: stable@vger.kernel.org Fixes: d11ccd451b65 ("fs: dlm: listen socket out of connection hash") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08fs: dlm: fix sock release if listen failsAlexander Aring1-2/+1
This patch fixes a double sock_release() call when the listen() is called for the dlm lowcomms listen socket. The caller of dlm_listen_for_all should never care about releasing the socket if dlm_listen_for_all() fails, it's done now only once if listen() fails. Cc: stable@vger.kernel.org Fixes: 2dc6b1158c28 ("fs: dlm: introduce generic listen") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-08dlm: replace one-element array with fixed size arrayPaulo Miguel Almeida2-2/+2
One-element arrays are deprecated. So, replace one-element array with fixed size array member in struct dlm_ls, and refactor the rest of the code, accordingly. Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/228 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 Link: https://lore.kernel.org/lkml/Y0W5jkiXUkpNl4ap@mail.google.com/ Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Teigland <teigland@redhat.com>
2022-11-06Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds6-7/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix a number of bugs, including some regressions, the most serious of which was one which would cause online resizes to fail with file systems with metadata checksums enabled. Also fix a warning caused by the newly added fortify string checker, plus some bugs that were found using fuzzed file systems" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix fortify warning in fs/ext4/fast_commit.c:1551 ext4: fix wrong return err in ext4_load_and_init_journal() ext4: fix warning in 'ext4_da_release_space' ext4: fix BUG_ON() when directory entry has invalid rec_len ext4: update the backup superblock's at the end of the online resize
2022-11-06Merge tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds6-62/+105
Pull cifs fixes from Steve French: "One symlink handling fix and two fixes foir multichannel issues with iterating channels, including for oplock breaks when leases are disabled" * tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix use-after-free on the link name cifs: avoid unnecessary iteration of tcp sessions cifs: always iterate smb sessions using primary channel
2022-11-06ext4: fix fortify warning in fs/ext4/fast_commit.c:1551Theodore Ts'o1-2/+3
With the new fortify string system, rework the memcpy to avoid this warning: memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4) Cc: stable@kernel.org Fixes: 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06ext4: fix wrong return err in ext4_load_and_init_journal()Jason Yan1-1/+1
The return value is wrong in ext4_load_and_init_journal(). The local variable 'err' need to be initialized before goto out. The original code in __ext4_fill_super() is fine because it has two return values 'ret' and 'err' and 'ret' is initialized as -EINVAL. After we factor out ext4_load_and_init_journal(), this code is broken. So fix it by directly returning -EINVAL in the error handler path. Cc: stable@kernel.org Fixes: 9c1dd22d7422 ("ext4: factor out ext4_load_and_init_journal()") Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06ext4: fix warning in 'ext4_da_release_space'Ye Bin1-1/+2
Syzkaller report issue as follows: EXT4-fs (loop0): Free/Dirty block details EXT4-fs (loop0): free_blocks=0 EXT4-fs (loop0): dirty_blocks=0 EXT4-fs (loop0): Block reservation details EXT4-fs (loop0): i_reserved_data_blocks=0 EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks ------------[ cut here ]------------ WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524 Modules linked in: CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528 RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296 RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00 RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000 RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5 R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000 R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740 FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461 mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589 ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852 do_writepages+0x3c3/0x680 mm/page-writeback.c:2469 __writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587 writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870 wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044 wb_do_writeback fs/fs-writeback.c:2187 [inline] wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> Above issue may happens as follows: ext4_da_write_begin ext4_create_inline_data ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS); ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA); __ext4_ioctl ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag ext4_da_write_begin ext4_da_convert_inline_data_to_extent ext4_da_write_inline_data_begin ext4_da_map_blocks ext4_insert_delayed_block if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk)) ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1 allocated = true; ext4_es_insert_delayed_block(inode, lblk, allocated); ext4_writepages mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1 ext4_es_remove_extent ext4_da_release_space(inode, reserved); if (unlikely(to_free > ei->i_reserved_data_blocks)) -> to_free == 1 but ei->i_reserved_data_blocks == 0 -> then trigger warning as above To solve above issue, forbid inode do migrate which has inline data. Cc: stable@kernel.org Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-06ext4: fix BUG_ON() when directory entry has invalid rec_lenLuís Henriques1-1/+9
The rec_len field in the directory entry has to be a multiple of 4. A corrupted filesystem image can be used to hit a BUG() in ext4_rec_len_to_disk(), called from make_indexed_dir(). ------------[ cut here ]------------ kernel BUG at fs/ext4/ext4.h:2413! ... RIP: 0010:make_indexed_dir+0x53f/0x5f0 ... Call Trace: <TASK> ? add_dirent_to_buf+0x1b2/0x200 ext4_add_entry+0x36e/0x480 ext4_add_nondir+0x2b/0xc0 ext4_create+0x163/0x200 path_openat+0x635/0xe90 do_filp_open+0xb4/0x160 ? __create_object.isra.0+0x1de/0x3b0 ? _raw_spin_unlock+0x12/0x30 do_sys_openat2+0x91/0x150 __x64_sys_open+0x6c/0xa0 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The fix simply adds a call to ext4_check_dir_entry() to validate the directory entry, returning -EFSCORRUPTED if the entry is invalid. CC: stable@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540 Signed-off-by: Luís Henriques <lhenriques@suse.de> Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-11-04cifs: fix use-after-free on the link nameChenXiaoSong2-6/+25
xfstests generic/011 reported use-after-free bug as follows: BUG: KASAN: use-after-free in __d_alloc+0x269/0x859 Read of size 15 at addr ffff8880078933a0 by task dirstress/952 CPU: 1 PID: 952 Comm: dirstress Not tainted 6.1.0-rc3+ #77 Call Trace: __dump_stack+0x23/0x29 dump_stack_lvl+0x51/0x73 print_address_description+0x67/0x27f print_report+0x3e/0x5c kasan_report+0x7b/0xa8 kasan_check_range+0x1b2/0x1c1 memcpy+0x22/0x5d __d_alloc+0x269/0x859 d_alloc+0x45/0x20c d_alloc_parallel+0xb2/0x8b2 lookup_open+0x3b8/0x9f9 open_last_lookups+0x63d/0xc26 path_openat+0x11a/0x261 do_filp_open+0xcc/0x168 do_sys_openat2+0x13b/0x3f7 do_sys_open+0x10f/0x146 __se_sys_creat+0x27/0x2e __x64_sys_creat+0x55/0x6a do_syscall_64+0x40/0x96 entry_SYSCALL_64_after_hwframe+0x63/0xcd Allocated by task 952: kasan_save_stack+0x1f/0x42 kasan_set_track+0x21/0x2a kasan_save_alloc_info+0x17/0x1d __kasan_kmalloc+0x7e/0x87 __kmalloc_node_track_caller+0x59/0x155 kstrndup+0x60/0xe6 parse_mf_symlink+0x215/0x30b check_mf_symlink+0x260/0x36a cifs_get_inode_info+0x14e1/0x1690 cifs_revalidate_dentry_attr+0x70d/0x964 cifs_revalidate_dentry+0x36/0x62 cifs_d_revalidate+0x162/0x446 lookup_open+0x36f/0x9f9 open_last_lookups+0x63d/0xc26 path_openat+0x11a/0x261 do_filp_open+0xcc/0x168 do_sys_openat2+0x13b/0x3f7 do_sys_open+0x10f/0x146 __se_sys_creat+0x27/0x2e __x64_sys_creat+0x55/0x6a do_syscall_64+0x40/0x96 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 950: kasan_save_stack+0x1f/0x42 kasan_set_track+0x21/0x2a kasan_save_free_info+0x1c/0x34 ____kasan_slab_free+0x1c1/0x1d5 __kasan_slab_free+0xe/0x13 __kmem_cache_free+0x29a/0x387 kfree+0xd3/0x10e cifs_fattr_to_inode+0xb6a/0xc8c cifs_get_inode_info+0x3cb/0x1690 cifs_revalidate_dentry_attr+0x70d/0x964 cifs_revalidate_dentry+0x36/0x62 cifs_d_revalidate+0x162/0x446 lookup_open+0x36f/0x9f9 open_last_lookups+0x63d/0xc26 path_openat+0x11a/0x261 do_filp_open+0xcc/0x168 do_sys_openat2+0x13b/0x3f7 do_sys_open+0x10f/0x146 __se_sys_creat+0x27/0x2e __x64_sys_creat+0x55/0x6a do_syscall_64+0x40/0x96 entry_SYSCALL_64_after_hwframe+0x63/0xcd When opened a symlink, link name is from 'inode->i_link', but it may be reset to a new value when revalidate the dentry. If some processes get the link name on the race scenario, then UAF will happen on link name. Fix this by implementing 'get_link' interface to duplicate the link name. Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+") Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04cifs: avoid unnecessary iteration of tcp sessionsShyam Prasad N3-51/+55
In a few places, we do unnecessary iterations of tcp sessions, even when the server struct is provided. The change avoids it and uses the server struct provided. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04cifs: always iterate smb sessions using primary channelShyam Prasad N4-5/+25
smb sessions and tcons currently hang off primary channel only. Secondary channels have the lists as empty. Whenever there's a need to iterate sessions or tcons, we should use the list in the corresponding primary channel. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-11-04Merge tag 'xfs-6.1-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds29-386/+670
Pull xfs fixes from Darrick Wong: "Dave and I had thought that this would be a very quiet cycle, but we thought wrong. At first there were the usual trickle of minor bugfixes, but then Zorro pulled -rc1 and noticed complaints about the stronger memcpy checks w.r.t. flex arrays. Analyzing how to fix that revealed a bunch of validation gaps in validating ondisk log items during recovery, and then a customer hit an infinite loop in the refcounting code on a corrupt filesystem. So. This largeish batch of fixes addresses all those problems, I hope. Summary: - Fix a UAF bug during log recovery - Fix memory leaks when mount fails - Detect corrupt bestfree information in a directory block - Fix incorrect return value type for the dax page fault handlers - Fix fortify complaints about memcpy of xfs log item objects - Strengthen inadequate validation of recovered log items - Fix incorrectly declared flex array in EFI log item structs - Log corrupt log items for debugging purposes - Fix infinite loop problems in the refcount code if the refcount btree node block keys are corrupt - Fix infinite loop problems in the refcount code if the refcount btree records suffer MSB bitflips - Add more sanity checking to continued defer ops to prevent overflows from one AG to the next or off EOFS" * tag 'xfs-6.1-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits) xfs: rename XFS_REFC_COW_START to _COWFLAG xfs: fix uninitialized list head in struct xfs_refcount_recovery xfs: fix agblocks check in the cow leftover recovery function xfs: check record domain when accessing refcount records xfs: remove XFS_FIND_RCEXT_SHARED and _COW xfs: refactor domain and refcount checking xfs: report refcount domain in tracepoints xfs: track cow/shared record domains explicitly in xfs_refcount_irec xfs: refactor refcount record usage in xchk_refcountbt_rec xfs: dump corrupt recovered log intent items to dmesg consistently xfs: move _irec structs to xfs_types.h xfs: actually abort log recovery on corrupt intent-done log items xfs: check deferred refcount op continuation parameters xfs: refactor all the EFI/EFD log item sizeof logic xfs: create a predicate to verify per-AG extents xfs: fix memcpy fortify errors in EFI log format copying xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents xfs: fix memcpy fortify errors in RUI log format copying xfs: fix memcpy fortify errors in CUI log format copying xfs: fix memcpy fortify errors in BUI log format copying ...
2022-11-03Merge tag 'fuse-fixes-6.1-rc4' of ↵Linus Torvalds2-1/+13
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix two rarely triggered but long-standing issues" * tag 'fuse-fixes-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add file_modified() to fallocate fuse: fix readdir cache race
2022-11-03Merge tag 'for-6.1-rc3-tag' of ↵Linus Torvalds5-49/+91
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A batch of error handling fixes for resource leaks, fixes for nowait mode in combination with direct and buffered IO: - direct IO + dsync + nowait could miss a sync of the file after write, add handling for this combination - buffered IO + nowait should not fail with ENOSPC, only blocking IO could determine that - error handling fixes: - fix inode reserve space leak due to nowait buffered write - check the correct variable after allocation (direct IO submit) - fix inode list leak during backref walking - fix ulist freeing in self tests" * tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix inode reserve space leak due to nowait buffered write btrfs: fix nowait buffered write returning -ENOSPC btrfs: remove pointless and double ulist frees in error paths of qgroup tests btrfs: fix ulist leaks in error paths of qgroup self tests btrfs: fix inode list leak during backref walking at find_parent_nodes() btrfs: fix inode list leak during backref walking at resolve_indirect_refs() btrfs: fix lost file sync on direct IO write with nowait and dsync iocb btrfs: fix a memory allocation failure test in btrfs_submit_direct
2022-11-02Merge tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds18-75/+80
Pull NFS client bugfixes from Anna Schumaker: - Fix some coccicheck warnings - Avoid memcpy() run-time warning - Fix up various state reclaim / RECLAIM_COMPLETE errors - Fix a null pointer dereference in sysfs - Fix LOCK races - Fix gss_unwrap_resp_integ() crasher - Fix zero length clones - Fix memleak when allocate slot fails * tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: nfs4: Fix kmemleak when allocate slot failed NFSv4.2: Fixup CLONE dest file size for zero-length count SUNRPC: Fix crasher in gss_unwrap_resp_integ() NFSv4: Retry LOCK on OLD_STATEID during delegation return SUNRPC: Fix null-ptr-deref when xps sysfs alloc failed NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot NFSv4.1: Handle RECLAIM_COMPLETE trunking errors NFSv4: Fix a potential state reclaim deadlock NFS: Avoid memcpy() run-time warning for struct sockaddr overflows nfs: Remove redundant null checks before kfree
2022-11-02btrfs: fix inode reserve space leak due to nowait buffered writeFilipe Manana1-1/+3
During a nowait buffered write, if we fail to balance dirty pages we exit btrfs_buffered_write() without releasing the delalloc space reserved for an extent, resulting in leaking space from the inode's block reserve. So fix that by releasing the delalloc space for the extent when balancing dirty pages fails. Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/all/202210111304.d369bc32-yujie.liu@intel.com Fixes: 965f47aeb5de ("btrfs: make btrfs_buffered_write nowait compatible") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-02btrfs: fix nowait buffered write returning -ENOSPCFilipe Manana1-0/+3
If we are doing a buffered write in NOWAIT context and we can't reserve metadata space due to -ENOSPC, then we should return -EAGAIN so that we retry the write in a context allowed to block and do metadata reservation with flushing, which might succeed this time due to the allowed flushing. Returning -ENOSPC while in NOWAIT context simply makes some writes fail with -ENOSPC when they would likely succeed after switching from NOWAIT context to blocking context. That is unexpected behaviour and even fio complains about it with a warning like this: fio: io_u error on file /mnt/sdi/task_0.0.0: No space left on device: write offset=1535705088, buflen=65536 fio: pid=592630, err=28/file:io_u.c:1846, func=io_u error, error=No space left on device The fio's job config is this: [global] bs=64K ioengine=io_uring iodepth=1 size=2236962133 nr_files=1 filesize=2236962133 direct=0 runtime=10 fallocate=posix io_size=2236962133 group_reporting time_based [task_0] rw=randwrite directory=/mnt/sdi numjobs=4 So fix this by returning -EAGAIN if we are in NOWAIT context and the metadata reservation failed with -ENOSPC. Fixes: 304e45acdb8f ("btrfs: plumb NOWAIT through the write path") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>