Skip to content

Fix stack corruption in rt_sdhci_init_host()#11542

Open
zhangyangysu wants to merge 3 commits into
RT-Thread:masterfrom
zhangyangysu:zhangyang
Open

Fix stack corruption in rt_sdhci_init_host()#11542
zhangyangysu wants to merge 3 commits into
RT-Thread:masterfrom
zhangyangysu:zhangyang

Conversation

@zhangyangysu

@zhangyangysu zhangyangysu commented Jul 1, 2026

Copy link
Copy Markdown

拉取/合并请求描述:(PR description)

[
Test on qemu-virt64-aarch64.
Reproduce:
Enable SDHCI, and set RT_NAME_MAX as 64, build and run qemu.py, then crash.

\ | /

  • RT - Thread Operating System
    / | \ 5.3.0 build Jul 1 2026 11:09:46
    2006 - 2024 Copyright by RT-Thread team
    [I/rtdm.pci] Bus I/O region(0):
    [I/rtdm.pci] cpu: [0x000000003eff0000, 0x000000003effffff]
    [I/rtdm.pci] physical: [0x0000000000000000, 0x000000000000ffff]
    [I/rtdm.pci] Bus Memory region(1):
    [I/rtdm.pci] cpu: [0x0000000010000000, 0x000000003efeffff]
    [I/rtdm.pci] physical: [0x0000000010000000, 0x000000003efeffff]
    [I/rtdm.pci] Bus Memory region(2):
    [I/rtdm.pci] cpu: [0x0000008000000000, 0x000000ffffffffff]
    [I/rtdm.pci] physical: [0x0000008000000000, 0x000000ffffffffff]
    [I/audio.hda] Found codec at address 0
    [I/audio.hda] Intel HD Audio v1.0 codec=0 dac=2 pin=3
    [I/rtdm.nvme] NVM Express v1.0 (PCI, QEMU NVMe Ctrl, 1.0)

exception info:
esr.EC :0x25
esr.IL :0x01
esr.ISS:0x00000006
epc :0x00000000401176e8
Data abort
fault addr = 0x0000000000000228
abort caused by read instruction
Translation fault, second level
Execption:
X00:0x0000000000000000 X01:0x00000000402298e0 X02:0x00000000402298a4 X03:0x000000000000000a
X04:0x0000000000000000 X05:0x00000000ffffffff X06:0x00000000ffffffff X07:0x0000000000000000
X08:0x000000004020ffc8 X09:0x0000000000000000 X10:0x0000000000000000 X11:0x0000000000000000
X12:0x0000000000000000 X13:0x0000000000000000 X14:0x0000000000000000 X15:0x0000000000000000
X16:0x0000000000000001 X17:0x0000000000000d9e X18:0x0000000000000000 X19:0x000000004013dda0
X20:0x000000004014cba8 X21:0x0000000000000000 X22:0x0000000000000016 X23:0x0000000000000017
X24:0x0000000000000018 X25:0x0000000000000019 X26:0x000000000000001a X27:0x000000000000001b
X28:0x000000000000001c X29:0x0000000040229880 X30:0x00000000401176e0
SP_EL0:0x0000000000000000
SPSR :0x0000000060000005
EPC :0x00000000401176e8
...
please use: addr2line -e rtthread.elf -a -f
0x401176e8 0x40117928 0xfffffffffffffffc

addr2line -e rtthread.elf -a -f 0x401176e8 0x40117928 0xfffffffffffffffc
0x00000000401176e8
rt_sdhci_init_host
/home/yangzhang/code/github/rt-thread/components/drivers/sdio/dev_sdhci.c:3543
0x0000000040117928
rt_sdhci_set_and_add_host
/home/yangzhang/code/github/rt-thread/components/drivers/sdio/dev_sdhci.c:3605
0xfffffffffffffffc
??
??:0

rt_sdhci_init_host() allocated a 32-byte local buffer for the device name:
char dev_name[32];
However, sdio_host_set_name() copies RT_NAME_MAX bytes into the output buffer:
rt_strncpy(out_devname, host->name, RT_NAME_MAX);
When RT_NAME_MAX is configured larger than 32 (for example 64), the copy overruns
the stack buffer and corrupts nearby local variables. This may corrupt the local mmc pointer
and lead to a data abort when accessing mmc->caps2.

为什么提交这份PR (why to submit this PR)

Kernel crash when enable SDHCI in some specific case, emmc/sd card can't work.

你的解决方案是什么 (what is your solution)

Fix this by sizing the local buffer dev_name[] with RT_NAME_MAX.

请提供验证的bsp和config (provide the config and bsp)

  • BSP:
    bsp/qemu-virt64-aarch64

  • .config:
    Just enable SDHCI in menuconfig and changing RT_NAME_MAX .
    CONFIG_RT_NAME_MAX=64
    CONFIG_RT_USING_SDIO=y
    CONFIG_RT_SDIO_STACK_SIZE=8192
    CONFIG_RT_SDIO_THREAD_PRIORITY=15
    CONFIG_RT_MMCSD_STACK_SIZE=8192
    CONFIG_RT_MMCSD_THREAD_PRIORITY=22
    CONFIG_RT_MMCSD_MAX_PARTITION=16
    CONFIG_RT_USING_SDHCI=y
    CONFIG_RT_SDIO_SDHCI_PCI=y

  • action:
    N/A (build verified locally on bsp/qemu-virt64-aarch64)
    ]

当前拉取/合并请求的状态 Intent for your PR

必须选择一项 Choose one (Mandatory):

  • 本拉取/合并请求是一个草稿版本 This PR is for a code-review and is intended to get feedback
  • 本拉取/合并请求是一个成熟版本 This PR is mature, and ready to be integrated into the repo

代码质量 Code Quality:

我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:

  • 已经仔细查看过代码改动的对比 Already check the difference between PR and old code
  • 代码风格正确,包括缩进空格,命名及其他风格 Style guide is adhered to, including spacing, naming and other styles
  • 没有垃圾代码,代码尽量精简,不包含#if 0代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up
  • 所有变更均有原因及合理的,并且不会影响到其他软件组件代码或BSP All modifications are justified and not affect other components or BSP
  • 对难懂代码均提供对应的注释 I've commented appropriately where code is tricky
  • 代码是高质量的 Code in this PR is of high quality
  • 已经使用formatting 等源码格式化工具确保格式符合RT-Thread代码规范 This PR complies with RT-Thread code specification
  • 如果是新增bsp, 已经添加ci检查到.github/ALL_BSP_COMPILE.json 详细请参考链接BSP自查

yang.zhang and others added 2 commits May 3, 2026 09:14
When msh tries to execute a non-ELF path, lwp_execve() may allocate a PID
before lwp_load() fails. The old error path only dropped the LWP reference,
leaving the PID tree entry pointing to a freed LWP.

In an init-less boot flow, this can poison pid 1 after a failed command from
msh. A later LWP launch may then treat the stale pid 1 entry as a valid parent
LWP, resulting in invalid pgrp/session state and a job-control assertion during
process exit.

Add lwp_pid_rollback() for exec/spawn failures before the process becomes
runnable. Unlike lwp_pid_put(), it always releases the PID lock and does not
enter the "no more pid allocation" state when the PID tree becomes empty.

Use the rollback helper in lwp_execve() failure paths after PID allocation.

Signed-off-by: zhangyang <gaoshanliukou@163.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

👋 感谢您对 RT-Thread 的贡献!Thank you for your contribution to RT-Thread!

为确保代码符合 RT-Thread 的编码规范,请在你的仓库中执行以下步骤运行代码格式化工作流(如果格式化CI运行失败)。
To ensure your code complies with RT-Thread's coding style, please run the code formatting workflow by following the steps below (If the formatting of CI fails to run).


🛠 操作步骤 | Steps

  1. 前往 Actions 页面 | Go to the Actions page
    点击进入工作流 → | Click to open workflow →

  2. 点击 Run workflow | Click Run workflow

  • 设置需排除的文件/目录(目录请以"/"结尾)
    Set files/directories to exclude (directories should end with "/")
  • 将目标分支设置为 \ Set the target branch to:zhangyang
  • 设置PR number为 \ Set the PR number to:11542
  1. 等待工作流完成 | Wait for the workflow to complete
    格式化后的代码将自动推送至你的分支。
    The formatted code will be automatically pushed to your branch.

完成后,提交将自动更新至 zhangyang 分支,关联的 Pull Request 也会同步更新。
Once completed, commits will be pushed to the zhangyang branch automatically, and the related Pull Request will be updated.

如有问题欢迎联系我们,再次感谢您的贡献!💐
If you have any questions, feel free to reach out. Thanks again for your contribution!

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

📌 Code Review Assignment

🏷️ Tag: components

Reviewers: @Maihuanyi

Changed Files (Click to expand)
  • components/drivers/sdio/dev_sdhci.c

📊 Current Review Status (Last Updated: 2026-07-01 14:28 CST)


📝 Review Instructions

  1. 维护者可以通过单击此处来刷新审查状态: 🔄 刷新状态
    Maintainers can refresh the review status by clicking here: 🔄 Refresh Status

  2. 确认审核通过后评论 LGTM/lgtm
    Comment LGTM/lgtm after confirming approval

  3. PR合并前需至少一位维护者确认
    PR must be confirmed by at least one maintainer before merging

ℹ️ 刷新CI状态操作需要具备仓库写入权限。
ℹ️ Refresh CI status operation requires repository Write permission.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ zhangyangysu
❌ yang.zhang


yang.zhang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

rt_sdhci_init_host() allocated a 32-byte local buffer for
the device name:

    char dev_name[32];

However, sdio_host_set_name() copies RT_NAME_MAX bytes into
the output buffer:

    rt_strncpy(out_devname, host->name, RT_NAME_MAX);

When RT_NAME_MAX is configured larger than 32 (for example
64), the copy overruns the stack buffer and corrupts nearby
local variables. This may corrupt the local mmc pointer and
lead to a data abort when accessing mmc->caps2.

Fix this by sizing the local buffer with RT_NAME_MAX.

Signed-off-by: zhangyangysu <zhangyangysu0928@gmail.com>
@CYFS3

CYFS3 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

这么修改应该是有问题的,因为后续还有rt_sprintf(&dev_name[len], "-timer");
虽然说创建workqueue或者线程的时候会被截断

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants