__asm__ __volatile__在C中做什么? [英] What does __asm__ __volatile__ do in C?

查看:402
本文介绍了__asm__ __volatile__在C中做什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从

http://www.mcs.anl.gov/~kazutomo/rdtsc.html



他们使用内嵌 asm 等,如下所示:

code1:

  static __inline__ tick gettick(void){
unsigned a,d;
__asm__ __volatile __(rdtsc:= a(a),= d(d)); (((滴答)a)|((滴答)d)<<(32));
}

code2:

  volatile int __attribute __((noinline))foo2(int a0,int a1){
__asm__ __volatile__();
}

我想知道code1和code2是做什么的?

解决方案

__易损_ c> block强制编译器的优化器按原样执行代码。如果没有它,优化器可能会认为它可以彻底删除,或者从循环中取消并缓存。



这对于 rdtsc 指令如下:

  __ asm__ __volatile __(rdtsc:= a(a), = d(d))

这不需要依赖关系,因此编译器可能会假定该值可以被缓存。使用挥发性物质可迫使它读取新的时间戳。



单独使用时,如下所示:

  __ asm__ __volatile__()

它实际上不会执行任何操作。但是,您可以扩展它,以获得编译时内存屏障,不允许重新排序任何内存访问指令:

  __asm__ __volatile__(:::memory)

rdtsc 指令是volatile的一个很好的例子。 rdtsc 通常用于需要计算某些指令需要执行多长时间的情况。想像一下这样的代码,你想要时间 r1 r2 的执行:

  __ asm__(rdtsc:= a(a0),= d(d0))
r1 = x1 + y1;
__asm__(rdtsc:= a(a1),= d(d1))
r2 = x2 + y2;
__asm__(rdtsc:= a(a2),= d(d2))

在这里,编译器实际上允许缓存时间戳,而有效的输出可能表明每行只需要0个时钟来执行。很明显,这不是你想要的,所以你介绍 __ volatile __ 来防止缓存:

  __ asm__ __volatile __(rdtsc:= a(a0),= d(d0))
r1 = x1 + y1; (a1),= d(d1))
r2 = x2 + y2;
__asm__ __volatile __(rdtsc:= a(a2),= d(d2))

现在您每次都会得到一个新的时间戳,但它仍然存在编译器和CPU都可以对所有这些语句重新排序的问题。在r1和r2已经计算好之后,它可能会最终执行asm块。要解决此问题,您需要添加一些强制序列化的障碍:

  __ asm__ __volatile __(mfence; rdtsc:= a(a0),= d(d0)::memory)
r1 = x1 + y1;
__asm__ __volatile __(mfence; rdtsc:= a(a1),= d(d1)::memory)
r2 = x2 + y2;
__asm__ __volatile __(mfence; rdtsc:= a(a2),= d(d2)::memory)

请注意这里的 mfence 指令,它强制执行一个CPU端的屏障,并且在volatile块中的memory指示符这实施了编译时的障碍。在现代的CPU上,你可以用 rdtscp 来替换 mfence:rdtsc ,以获得更高的效率。


I looked into some C code from

http://www.mcs.anl.gov/~kazutomo/rdtsc.html

They use stuff like "inline", "asm" etc like the following:

code1:

static __inline__ tick gettick (void) {
    unsigned a, d;
    __asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) );
    return (((tick)a) | (((tick)d) << 32));
}

code2:

volatile int  __attribute__((noinline)) foo2 (int a0, int a1) {
    __asm__ __volatile__ ("");
}

I was wondering what does the code1 and code2 do?

解决方案

The __volatile__ modifier on an __asm__ block forces the compiler's optimizer to execute the code as-is. Without it, the optimizer may think it can be either removed outright, or lifted out of a loop and cached.

This is useful for the rdtsc instruction like so:

__asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) )

This takes no dependencies, so the compiler might assume the value can be cached. Volatile is used to force it to read a fresh timestamp.

When used alone, like this:

__asm__ __volatile__ ("")

It will not actually execute anything. You can extend this, though, to get a compile-time memory barrier that won't allow reordering any memory access instructions:

__asm__ __volatile__ ("":::"memory")

The rdtsc instruction is a good example for volatile. rdtsc is usually used when you need to time how long some instructions take to execute. Imagine some code like this, where you want to time r1 and r2's execution:

__asm__ ("rdtsc": "=a" (a0), "=d" (d0) )
r1 = x1 + y1;
__asm__ ("rdtsc": "=a" (a1), "=d" (d1) )
r2 = x2 + y2;
__asm__ ("rdtsc": "=a" (a2), "=d" (d2) )

Here the compiler is actually allowed to cache the timestamp, and valid output might show that each line took exactly 0 clocks to execute. Obviously this isn't what you want, so you introduce __volatile__ to prevent caching:

__asm__ __volatile__("rdtsc": "=a" (a0), "=d" (d0))
r1 = x1 + y1;
__asm__ __volatile__("rdtsc": "=a" (a1), "=d" (d1))
r2 = x2 + y2;
__asm__ __volatile__("rdtsc": "=a" (a2), "=d" (d2))

Now you'll get a new timestamp each time, but it still has a problem that both the compiler and the CPU are allowed to reorder all of these statements. It could end up executing the asm blocks after r1 and r2 have already been calculated. To work around this, you'd add some barriers that force serialization:

__asm__ __volatile__("mfence;rdtsc": "=a" (a0), "=d" (d0) :: "memory")
r1 = x1 + y1;
__asm__ __volatile__("mfence;rdtsc": "=a" (a1), "=d" (d1) :: "memory")
r2 = x2 + y2;
__asm__ __volatile__("mfence;rdtsc": "=a" (a2), "=d" (d2) :: "memory")

Note the mfence instruction here, which enforces a CPU-side barrier, and the "memory" specifier in the volatile block which enforces a compile-time barrier. On modern CPUs, you can replace mfence:rdtsc with rdtscp for something more efficient.

这篇关于__asm__ __volatile__在C中做什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆