P"采用改性剂QUOT GCC内联汇编;与约束" P"在" M"在Linux内核 [英] gcc inline assembly using modifier "P" and constraint "p" over "m" in Linux kernel

查看:269
本文介绍了P"采用改性剂QUOT GCC内联汇编;与约束" P"在" M"在Linux内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在读Linux内核源代码code(3.12.5 x86_64的)了解进程描述符是如何处理的。

I'm reading Linux kernel source code (3.12.5 x86_64) to understand how process descriptor is handled.

我发现来获得当前进程的描述符,我可以使用current_thread_info()函数,该函数执行如下:

I found to get current process descriptor I could use current_thread_info() function, which is implemented as follows:

static inline struct thread_info *current_thread_info(void)
{
    struct thread_info *ti;
    ti = (void *)(this_cpu_read_stable(kernel_stack) +
         KERNEL_STACK_OFFSET - THREAD_SIZE);
    return ti;
}

然后我看着 this_cpu_read_stable()

#define this_cpu_read_stable(var)       percpu_from_op("mov", var, "p" (&(var)))

#define percpu_from_op(op, var, constraint) \
({ \
typeof(var) pfo_ret__; \
switch (sizeof(var)) { \
...
case 8: \
    asm(op "q "__percpu_arg(1)",%0" \
    : "=r" (pfo_ret__) \
    : constraint); \
    break; \
default: __bad_percpu_size(); \
} \
pfo_ret__; \
})

#define __percpu_arg(x)         __percpu_prefix "%P" #x

#ifdef CONFIG_SMP
#define __percpu_prefix "%%"__stringify(__percpu_seg)":"
#else
#define __percpu_prefix ""
#endif

#ifdef CONFIG_X86_64
#define __percpu_seg gs
#else
#define __percpu_seg fs
#endif

该扩展的宏应该是内联汇编code是这样的:

The expanded macro should be inline asm code like this:

asm("movq %%gs:%P1,%0" : "=r" (pfo_ret__) : "p"(&(kernel_stack))); 

根据这个帖子使用的输入限制为M(kernel_stack ),这对我来说很有意义。但显然,以提高性能莱纳斯改变了制约p和传递变量的地址:

According to this post the input constraint used to be "m"(kernel_stack), which makes sense to me. But obviously to improve performance Linus changed the constraint to "p" and passed the address of variable:

It uses a "p" (&var) constraint instead of a "m" (var) one, to make gcc 
think there is no actual "load" from memory. This obviously _only_ works 
for percpu variables that are stable within a thread, but 'current' and 
'kernel_stack' should be that way.

此外,在帖子 Tejun许使这个意见:

Also in post Tejun Heo made this comments:

Added the magical undocumented "P" modifier to UP __percpu_arg()
to force gcc to dereference the pointer value passed in via the
"p" input constraint.  Without this, percpu_read_stable() returns
the address of the percpu variable.  Also added comment explaining
the difference between percpu_read() and percpu_read_stable().

不过我相结合修改P改性剂和约束实验P(安培; VAR)没有工作。如果未指定段寄存器,%P1始终返回变量的地址。指针不取消引用。我必须用一个支架取消对它的引用,如(%P1)。如果指定段寄存器,无支架GCC甚至不会编译。我的测试code是如下:

But my experiments with combining modifier "P" modifier and constraint "p(&var)" did not work. If segment register is not specified, "%P1" always returns the address of the variable. The pointer was not dereferenced. I have to use a bracket to dereference it, like "(%P1)". If segment register is specified, without bracket gcc won't even compile. My test code is as follows:

#include <stdio.h>

#define current(var) ({\
        typeof(var) pfo_ret__;\
        asm(\
                "movq %%es:%P1, %0\n"\
                : "=r"(pfo_ret__)\
                : "p" (&(var))\
        );\
        pfo_ret__;\
        })

int main () {
        struct foo {
                int field1;
                int field2;
        } a = {
                .field1 = 100,
                .field2 = 200,
        };
        struct foo *var = &a;

        printf ("field1: %d\n", current(var)->field1);
        printf ("field2: %d\n", current(var)->field2);

        return 0;
}

这有什么错我的code?或者我需要追加对GCC的一些选项?此外,当我用gcc -S生成装配code我没有用页的通过M看到优化。任何答案或意见,很多AP preciated。

Is there anything wrong with my code? Or do I need to append some options for gcc? Also when I used gcc -S to generate assembly code I didn't see optimization by using "p" over "m". Any answer or comments is much appreciated.

推荐答案

之所以你的榜样code不工作是因为P约束只是内联汇编非常有限的用途。所有的内联汇编操作数有问题,他们被重新presentable作为汇编语言操作数的要求。如果不是编译器使得它如此,首先将它移动到寄存器而代,作为操作的操作数未重新presentable。在P约束放置一个额外的限制:操作数必须是一个有效的地址。的问题是,一个寄存器是不是一个有效地址。的寄存器可以包含一个地址,但寄存器本身不是有效的地址。

The reason why your example code doesn't work is because the "p" constraint is only of a very limited use in inline assembly. All inline assembly operands have the requirement that they be representable as an operand in assembly language. If the operand isn't representable than compiler makes it so by moving it to a register first and substituting that as the operand. The "p" constraint places an additional restriction: the operand must be a valid address. The problem is that a register isn't a valid address. A register can contain an address but a register is not itself an valid address.

这意味着P的操作数约束必须有一个有效的重新装配presentation的是,做一个有效的地址。你试图使用堆栈操作数上一个变量的地址。虽然这是一个有效的地址,它不是一个有效的操作。堆栈变量本身具有有效的再presentation(类似 8(RBP%)),但堆栈变量的地址没有。 (如果它被重新presentable它会像 8 +%RBP ,但这不是一个合法的操作。)

That means the operand of the "p" constraint must be have a valid assembly representation as is and be a valid address. You're trying to use the address of a variable on the stack as the operand. While this is a valid address, it's not a valid operand. The stack variable itself has a valid representation (something like 8(%rbp)), but the address of the stack variable doesn't. (If it were representable it would be something like 8 + %rbp, but this isn't a legal operand.)

对,你可以采取的地址,并使用P约束是静态分配变量的操作使用几件事情。在这种情况下,它是一个有效的装配操作,因为它可以重新presented作为一个立即值(例如:&放大器; kernel_stack 可重新presented为 $ kernel_stack )。这也是一个有效的地址,因此满足约束。

One of the few things that you can take the address of and use as an operand with the "p" constraint is a statically allocated variable. In this case it's a valid assembly operand, as it can be represented as an immediate value (eg. &kernel_stack can be represented as $kernel_stack). It's also a valid address and so satisfies the constraint.

所以这就是为什么Linux内核宏的工作和你的宏没有。你想与堆栈变量使用它,而内核只有静态分配的变量使用它。

So that's why Linux kernel macro works and you macro doesn't. You're trying to use it with stack variables, while the kernel only uses it with statically allocated variables.

或者至少是看起来像一个静态分配variabvle编译器。事实上 kernel_stack 实际上是在用于每个CPU数据的专门章节分配。本节中实际上不存在,而不是它用作模板来创建的存储器的单独区域每个CPU。 kernel_stack 在这个特殊的部分偏移量被用作在每一个每个CPU的数据区域的偏移量存储每个CPU单独的内核堆栈值。在FS或GS段寄存器用作该区域的基础上,每个CPU使用不同的地址为基

Or at least what looks like a statically allocated variabvle to the compiler. In fact kernel_stack is actually allocated in a special section used for per CPU data. This section doesn't actually exist, instead it's used as a template to create a separate region of memory for each CPU. The offset of kernel_stack in this special section is used as the offset in each per CPU data region to store a separate kernel stack value for each CPU. The FS or GS segment register is used as the base of this region, each CPU using a different address as the base.

所以这就是为什么Linux内核使用内联汇编访问什么,否则看起来像一个静态变量。宏用于开启静态变量到每个CPU变量。如果你并不想这样做,那么你可能没有什么从内核宏复制来获得。或许你也应该考虑以不同的方式做你想要完成的任务。

So that's why the Linux kernel use inline assembly to access what otherwise looks like a static variable. The macro is used to turn the static variable into a per CPU variable. If you're not trying to do something like this then you probably don't have anything to gain by copying from the kernel macro. You should probably be considering a different way to do what you're trying accomplish.

现在,如果你想从Linus Torvalds公司已经配备了这种优化内核,以取代以P一个M约束它必须是一个好主意,一般做这个,你应该很清楚这种优化是多么的脆弱。它有什么要做的是GCC愚弄,以为参考 kernel_stack 不实际访问内存,这样才不会让重装每次改变内存的时间的价值。这样做的危险是,如果 kernel_stack 确实改变那么编译器就会被愚弄,并继续使用旧的价值。莱纳斯知道何时以及如何在每个CPU的变量发生变化,因此可以相信,用于在内核预期目的当宏是安全的。

Now if you're thinking since Linus Torvalds has come with this optimization in the kernel to replace an "m" constraint with a "p" it must be a good idea to do this generally, you should be very aware how fragile this optimization is. What its trying to do is fool GCC into thinking that reference to kernel_stack doesn't actually access memory, so that it won't keep reloading the value every time it changes memory. The danger here is that if kernel_stack does change then the compiler will be fooled, and continue to use the old value. Linus knows when and how the per CPU variables are changed, and so can be confident that the macro is safe when used for its intended purpose in the kernel.

如果你想消灭你自己的code冗余负载,我建议使用 -fstrict走样和/或限制关键字。这样,你不依赖于脆弱的和非便携式内联汇编宏。

If you want eliminate redundant loads in your own code, I suggest using -fstrict-aliasing and/or the restrict keyword. That way you're not dependant on a fragile and non-portable inline assembly macros.

这篇关于P&QUOT;采用改性剂QUOT GCC内联汇编;与约束&QUOT; P&QUOT;在&QUOT; M&QUOT;在Linux内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆