summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMichal Wilczynski <m.wilczynski@samsung.com>2024-10-03 14:53:27 +0200
committerMichal Wilczynski <m.wilczynski@samsung.com>2024-10-04 12:37:22 +0200
commitc342a18d9bb780f7014f3a993c6d6c4680bd671d (patch)
treea85f3b3a04d9d9460303a1431e9e1703a36685bf
parente30e9d2c5988980163c78b66dc11722d55c4bcac (diff)
downloadlinux-thead-c342a18d9bb780f7014f3a993c6d6c4680bd671d.tar.gz
linux-thead-c342a18d9bb780f7014f3a993c6d6c4680bd671d.tar.bz2
linux-thead-c342a18d9bb780f7014f3a993c6d6c4680bd671d.zip
mmc: sdhci: Prevent stale command and data interrupt handling
While working with the T-Head 1520 LicheePi4A SoC, certain conditions arose that allowed me to reproduce a race issue in the sdhci code. To reproduce the bug, you need to enable the sdio1 controller in the device tree file `arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi` as follows: &sdio1 { bus-width = <4>; max-frequency = <100000000>; no-sd; no-mmc; broken-cd; cap-sd-highspeed; post-power-on-delay-ms = <50>; status = "okay"; wakeup-source; keep-power-in-suspend; }; When resetting the SoC using the reset button, the following messages appear in the dmesg log: [ 8.164898] mmc2: Got command interrupt 0x00000001 even though no command operation was in progress. [ 8.174054] mmc2: sdhci: ============ SDHCI REGISTER DUMP =========== [ 8.180503] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000005 [ 8.186950] mmc2: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 8.193395] mmc2: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 8.199841] mmc2: sdhci: Present: 0x03da0000 | Host ctl: 0x00000000 [ 8.206287] mmc2: sdhci: Power: 0x0000000f | Blk gap: 0x00000000 [ 8.212733] mmc2: sdhci: Wake-up: 0x00000000 | Clock: 0x0000decf [ 8.219178] mmc2: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 8.225622] mmc2: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 8.232068] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 8.238513] mmc2: sdhci: Caps: 0x3f69c881 | Caps_1: 0x08008177 [ 8.244959] mmc2: sdhci: Cmd: 0x00000502 | Max curr: 0x00191919 [ 8.254115] mmc2: sdhci: Resp[0]: 0x00001009 | Resp[1]: 0x00000000 [ 8.260561] mmc2: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000 [ 8.267005] mmc2: sdhci: Host ctl2: 0x00001000 [ 8.271453] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000000000000 [ 8.278594] mmc2: sdhci: ============================================ I also enabled some traces to better understand the problem: kworker/3:1-62 [003] ..... 8.163538: mmc_request_start: mmc2: start struct mmc_request[000000000d30cc0c]: cmd_opcode=5 cmd_arg=0x0 cmd_flags=0x2e1 cmd_retries=0 stop_opcode=0 stop_arg=0x0 stop_flags=0x0 stop_retries=0 sbc_opcode=0 sbc_arg=0x0 sbc_flags=0x0 sbc_retires=0 blocks=0 block_size=0 blk_addr=0 data_flags=0x0 tag=0 can_retune=0 doing_retune=0 retune_now=0 need_retune=0 hold_retune=1 retune_period=0 <idle>-0 [000] d.h2. 8.164816: sdhci_cmd_irq: hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x10000 intmask_p=0x18000 irq/24-mmc2-96 [000] ..... 8.164840: sdhci_thread_irq: msg= irq/24-mmc2-96 [000] d.h2. 8.164896: sdhci_cmd_irq: hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x1 intmask_p=0x1 irq/24-mmc2-96 [000] ..... 8.285142: mmc_request_done: mmc2: end struct mmc_request[000000000d30cc0c]: cmd_opcode=5 cmd_err=-110 cmd_resp=0x0 0x0 0x0 0x0 cmd_retries=0 stop_opcode=0 stop_err=0 stop_resp=0x0 0x0 0x0 0x0 stop_retries=0 sbc_opcode=0 sbc_err=0 sbc_resp=0x0 0x0 0x0 0x0 sbc_retries=0 bytes_xfered=0 data_err=0 tag=0 can_retune=0 doing_retune=0 retune_now=0 need_retune=0 hold_retune=1 retune_period=0 Here's what happens: the __mmc_start_request function is called with opcode 5. Since the power to the Wi-Fi card, which resides on this SDIO bus, is initially off after the reset, an interrupt SDHCI_INT_TIMEOUT is triggered. Immediately after that, a second interrupt SDHCI_INT_RESPONSE is triggered. Depending on the exact timing, these conditions can trigger the following race problem: 1) The sdhci_cmd_irq top half handles the command as an error. It sets host->cmd to NULL and host->pending_reset to true. 2) The sdhci_thread_irq bottom half is scheduled next and executes faster than the second interrupt handler for SDHCI_INT_RESPONSE. It clears host->pending_reset before the SDHCI_INT_RESPONSE handler runs. 3) The pending interrupt SDHCI_INT_RESPONSE handler gets called, triggering a code path that prints: "mmc2: Got command interrupt 0x00000001 even though no command operation was in progress." To solve this issue, we need to clear pending interrupts when resetting host->pending_reset. This ensures that after sdhci_threaded_irq restores interrupts, there are no pending stale interrupts. Change-Id: Idd6b1623cd842e00a30466074a225e7d847d1f98 Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
-rw-r--r--drivers/mmc/host/sdhci.c4
1 files changed, 4 insertions, 0 deletions
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index c79f73459915..ddd7f2dd74f2 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -3114,6 +3114,10 @@ static bool sdhci_request_done(struct sdhci_host *host)
sdhci_reset_for(host, REQUEST_ERROR);
host->pending_reset = false;
+
+ /* Clear any pending interrupts after reset */
+ sdhci_writel(host, SDHCI_INT_CMD_MASK | SDHCI_INT_DATA_MASK,
+ SDHCI_INT_STATUS);
}
/*