当mov指令导致页面错误并在x86上禁用了中断时会发生什么情况? [英] What happens when a mov instruction causes a page fault with interrupts disabled on x86?

查看:141
本文介绍了当mov指令导致页面错误并在x86上禁用了中断时会发生什么情况?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在自定义Linux内核(2.6.31.5,x86)驱动程序中遇到问题,其中copy_to_user不会定期不将任何字节复制到用户空间.它将返回传递给它的字节数,表明它没有复制任何内容.经过代码检查后,我们发现代码在调用copy_to_user时禁用了中断,这违反了合同.更正此问题后,该问题不再发生.因为问题很少发生,所以我需要证明禁用中断会导致问题.

I recently encountered an issue in a custom Linux kernel (2.6.31.5, x86) driver where copy_to_user would periodically not copy any bytes to user space. It would return the count of bytes passed to it, indicating that it had not copied anything. After code inspection we found that the code was disabling interrupts while calling copy_to_user which violates it's contract. After correcting this, the issue stopped occurring. Because the issue happened so infrequently, I need to prove that disabling the interrupts caused the issue.

如果您查看以下来自arch/x86/lib/usercopy_32.c rep的代码片段; movsl通过CX中的计数将单词复制到用户空间.尺寸在退出时使用CX更新.如果movsl正确执行,则CX将为0.因为CX不为零,所以该举动吗?为了适应copy_to_user的定义和观察到的行为,必须未执行指令.

If you look at the code snippet below from arch/x86/lib/usercopy_32.c rep; movsl copies the words to userspace by the count in CX. Size is updated with CX on exit. CX will be 0 if the movsl execute correctly. Because CX is not zero, the movs? instructions must not have executed, in order to fit the definition of copy_to_user and the observed behavior.

/* Generic arbitrary sized copy.  */
#define __copy_user(to, from, size)                 \
do {                                    \
    int __d0, __d1, __d2;                       \
    __asm__ __volatile__(                       \
        "   cmp  $7,%0\n"                   \
        "   jbe  1f\n"                  \
        "   movl %1,%0\n"                   \
        "   negl %0\n"                  \
        "   andl $7,%0\n"                   \
        "   subl %0,%3\n"                   \
        "4: rep; movsb\n"                   \
        "   movl %3,%0\n"                   \
        "   shrl $2,%0\n"                   \
        "   andl $3,%3\n"                   \
        "   .align 2,0x90\n"                \
        "0: rep; movsl\n"                   \
        "   movl %3,%0\n"                   \
        "1: rep; movsb\n"                   \
        "2:\n"                          \
        ".section .fixup,\"ax\"\n"              \
        "5: addl %3,%0\n"                   \
        "   jmp 2b\n"                   \
        "3: lea 0(%3,%0,4),%0\n"                \
        "   jmp 2b\n"                   \
        ".previous\n"                       \
        ".section __ex_table,\"a\"\n"               \
        "   .align 4\n"                 \
        "   .long 4b,5b\n"                  \
        "   .long 0b,3b\n"                  \
        "   .long 1b,2b\n"                  \
        ".previous"                     \
        : "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2)   \
        : "3"(size), "0"(size), "1"(to), "2"(from)      \
        : "memory");                        \
} while (0)

我有两个想法:

  1. 禁用中断时,不会发生页面错误,并且 然后代表movs?不执行任何操作而被跳过.返回值 则为CX,或未复制到用户空间的金额,因为 定义指定并观察到的行为.
  2. 确实发生了页面错误,但是linux由于中断被禁用而无法处理,因此页面错误处理程序跳过了该指令,尽管我不知道页面错误处理程序将如何执行此操作.同样,在这种情况下,CX将保持不变,并且返回值将是正确的.
  1. when the interrupts are disabled, the page fault does not occur and then rep; movs? is skipped without doing anything. The return value would then be CX, or the amount not copied to userspace, as the definition specifies and the behavior observed.
  2. The page fault does occur, but linux can not process it because interrupts are disabled, so the page fault handler skips the instruction, although I don't know how the page fault handler would do this. Again, in this case CX would remain unmodified and the return value would be correct.

有人可以指出我在英特尔手册中指定此行为的部分,还是指出其他可能有用的Linux资源?

Can anyone point me to the sections in the Intel manuals that specify this behavior, or point me to any additional Linux source that could be helpful?

推荐答案

我找到了答案.我的第二个建议是正确的,而且机制就在我的面前.确实发生了页面错误,但是fixup_exception机制用于提供异常/继续机制.本节将条目添加到异常处理程序表中:

I've found the answer. My #2 suggestion was correct and the mechanism was right in front of my face. The page fault does happen, but the fixup_exception mechanism is used to provide a exception/continue mechanism. This section adds entries to the exception handler table:

    ".section __ex_table,\"a\"\n"               \
    "   .align 4\n"                 \
    "   .long 4b,5b\n"                  \
    "   .long 0b,3b\n"                  \
    "   .long 1b,6b\n"                  \
    ".previous"                     \

这是说:如果IP地址是第一个条目,并且在故障处理程序中遇到异常,则将IP地址设置为第二个地址并继续.

This says: if the IP address is the first entry and an exception is encountered in a fault handler, then set the IP address to the second address and continue.

因此,如果异常发生在"4:",请跳至"5:".如果异常发生在"0:",则跳至"3:",如果异常发生在"1:",则跳至"6:".

So if the exception happens at "4:", jump to "5:". If the exception happens at "0:" then jump to "3:" and if the exception happens at "1:" jump to "6:".

缺少的片段在arch/x86/mm/fault.c的do_page_fault()中:

The missing piece is in do_page_fault() in arch/x86/mm/fault.c:

/*
 * If we're in an interrupt, have no user context or are running
 * in an atomic region then we must not take the fault:
 */
if (unlikely(in_atomic() || !mm)) {
    bad_area_nosemaphore(regs, error_code, address);
    return;
}

in_atomic返回true,因为我们处于write_lock_bh()锁中! bad_area_nosemaphore最终会进行修复.

in_atomic returned true because we are in a write_lock_bh() lock! bad_area_nosemaphore eventually does the fixup.

如果发生page_fault(由于工作空间的概念,这不太可能发生),则函数调用将失败并跳出__copy_user宏,未复制的字节设置为size,因为禁用了抢占.

If a page_fault would occur (which was unlikely, because of the concept of the working space) then the function call would fail and jump out of the __copy_user macro, with the uncopied bytes set to size because preemption was disabled.

这篇关于当mov指令导致页面错误并在x86上禁用了中断时会发生什么情况?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆