产生code不匹配,支持扩展ASM预期 [英] Generated code not matching expectations with Extended ASM
问题描述
我有一个 CpuFeatures
类。该类的要求很简单:(1)preserve EBX
或 RBX
,和(2)记录在 EAX / EBX / ECX / EDX
从 CPUID
返回的值。我不知道正在生成code是code我意。
I have a CpuFeatures
class. The requirements for the class are simple: (1) preserve EBX
or RBX
, and (2) record the values returned from CPUID
in EAX/EBX/ECX/EDX
. I'm not sure the code being generated is the code I intended.
的 CpuFeatures
类code使用GCC扩展ASM。下面是相关的code:
The CpuFeatures
class code uses GCC Extended ASM. Here's the relevant code:
struct CPUIDinfo
{
word32 EAX;
word32 EBX;
word32 ECX;
word32 EDX;
};
bool CpuId(word32 func, word32 subfunc, CPUIDinfo& info)
{
uintptr_t scratch;
__asm__ __volatile__ (
".att_syntax \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
"\t cpuid \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
: "=a"(info.EAX), "=&r"(scratch), "=c"(info.ECX), "=d"(info.EDX)
: "a"(func), "c"(subfunc)
);
if(func == 0)
return !!info.EAX;
return true;
}
在code以下是用 -g3 -oG
编制在Cygwin I386。当我检查它在调试器下,我不喜欢我所看到的。
The code below was compiled with -g3 -Og
on Cygwin i386. When I examine it under a debugger, I'm don't like what I am seeing.
Dump of assembler code for function CpuFeatures::DoDetectX86Features():
...
0x0048f355 <+1>: sub $0x48,%esp
=> 0x0048f358 <+4>: mov $0x0,%ecx
0x0048f35d <+9>: mov %ecx,%eax
0x0048f35f <+11>: xchg %ebx,%ebx
0x0048f361 <+13>: cpuid
0x0048f363 <+15>: xchg %ebx,%ebx
0x0048f365 <+17>: mov %eax,0x10(%esp)
0x0048f369 <+21>: mov %ecx,0x18(%esp)
0x0048f36d <+25>: mov %edx,0x1c(%esp)
0x0048f371 <+29>: mov %ebx,0x14(%esp)
0x0048f375 <+33>: test %eax,%eax
...
我不喜欢我所看到的,因为它出现 EBX / RBX
是的不的是preserved( XCHG%EBX,EBX%
在 +11
)。此外,它看起来像pserved的$ P $ EBX / RBX
被保存为 CPUID
,而不是结果 EBX
按 CPUID返回
( XCHG%EBX,%EBX $的实际值C $ C>在
+15
的 MOV EBX%,0×14(%ESP)
在 +29
)。
I don't like what I am seeing because it appears EBX/RBX
is not being preserved (xchg %ebx,%ebx
at +11
). Additionally, it looks like the preserved EBX/RBX
is being saved as the result of CPUID
, and not the actual value of EBX
returned by CPUID
(xchg %ebx,%ebx
at +15
, before the mov %ebx,0x14(%esp)
at +29
).
如果我更改了操作使用带有内存运算=&安培; M(从零开始)
,然后将生成的code是:
If I change the operand to use a memory op with "=&m"(scratch)
, then the generated code is:
0x0048f35e <+10>: xchg %ebx,0x40(%esp)
0x0048f362 <+14>: cpuid
0x0048f364 <+16>: xchg %ebx,0x40(%esp)
一个相关的问题是什么,确保读/写操作数在具有扩展ASM所需的时间发生?
我是什么做错了(除了上应该采取5分钟或15分钟的东西浪费了无数的时间)?
What am I doing wrong (besides wasting countless hours on something that should have taken 5 or 15 minutes)?
推荐答案
在code以下直接是我用来编译上面包括修改交换(SWAP)的例子code一个完整的例子给 info.EBX
变量。
The code below is a complete example that I used to compile your example code above including the modification to exchange(swap) directly to the info.EBX
variable.
#include <inttypes.h>
#define word32 uint32_t
struct CPUIDinfo
{
word32 EAX;
word32 EBX;
word32 ECX;
word32 EDX;
};
bool CpuId(word32 func, word32 subfunc, CPUIDinfo& info)
{
__asm__ __volatile__ (
".att_syntax \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
"\t cpuid \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
: "=a"(info.EAX), "=&m"(info.EBX), "=c"(info.ECX), "=d"(info.EDX)
: "a"(func), "c"(subfunc)
);
if(func == 0)
return !!info.EAX;
return true;
}
int main()
{
CPUIDinfo cpuInfo;
CpuId(1, 0, cpuInfo);
}
这是你应该做的第一个观察是,我选择使用info.EBX内存位置做实际的互换。这消除了需要一个又一个临时变量或注册。
The first observation that you should make is that I chose to use the info.EBX memory location to do the actual swap to. This eliminates needing a another temporary variable or register.
我组装成32位code。与 -g3 -oG -S -m32
,并获得这些利益的说明:
I assembled as 32-bit code with -g3 -Og -S -m32
and got these instructions of interest:
xchgl %ebx, 4(%edi)
cpuid
xchgl %ebx, 4(%edi)
movl %eax, (%edi)
movl %ecx, 8(%edi)
movl %edx, 12(%edi)
%EDI
恰好包含信息
结构的地址。 4(%EDI)
恰好是 info.EBX
的地址。我们换用%EBX
和 4(%EDI)
在 CPUID
。该指示 EBX
恢复到它以前 CPUID
和 4(%EDI)
现在有什么 EBX
是对后 CPUID
被执行死刑。其余的 MOVL
线的地方 EAX
, ECX
, EDX
注册到通过信息
结构的其余部分%EDI
寄存器。
%edi
happens to contain the address of the info
structure. 4(%edi)
happens to be the address of info.EBX
. We swap %ebx
and 4(%edi)
after cpuid
. With that instruction ebx
is restored to what it was before cpuid
and 4(%edi)
now has what ebx
was right after cpuid
was executed. The remaining movl
lines place eax
, ecx
, edx
registers into the rest of the info
structure via the %edi
register.
生成的code以上就是我希望它是。
The generated code above is what I would expect it to be.
您code与从头
变量(并用约束=&安培; M(从零开始)
)永远不会被汇编模板后使用,因此%EBX,0X40(%ESP)
有你想要的价值,但它从来没有被移动的任何地方非常有用。你必须在从头
变量复制到 info.EBX
(即信息。 EBX =划伤;
),并期待在这一切得到生成的最终指令。在某些时候将数据从从头复制
生成的汇编指令中的内存位置 info.EBX
。
Your code with the scratch
variable (and using the constraint "=&m"(scratch)
) never gets used after the assembler template so %ebx,0x40(%esp)
has the value you want but it never gets moved anywhere useful. You'd have to copy the scratch
variable into info.EBX
(ie. info.EBX = scratch;
)and look at all of the resulting instructions that get generated. At some point the data would be copied from the scratch
memory location to info.EBX
among the generated assembly instructions.
更新 - 的Cygwin和MinGW
我并不完全满意了Cygwin的code输出是正确的。在半夜我有一个啊哈!时刻。窗户已经做了自己的位置无关code的动态链接装载器装载的图像(DLL等),并通过重新筑底修改图像。有没有需要额外的PIC的处理就像是在Linux的32位共享库,这样做有一个与 EBX
/ RBX $ C无问题$ C>。这就是为什么Cygwin和MinGW的有
-fPIC
I wasn't entirely satisfied that the Cygwin code output was correct. In the middle of the night I had an Aha! moment. Windows already does its own position independent code when the dynamic link loader loads an image (DLL etc) and modifies the image via re-basing. There is no need for additional PIC processing like it is done in Linux 32 bit shared libraries so there is no issue with ebx
/rbx
. This is why Cygwin and MinGW will present warnings like this when compiling with -fPIC
警告:-fPIC忽略目标(所有code是与位置无关)
warning: -fPIC ignored for target (all code is position independent)
这是因为Windows下的所有32位code可以重新基础,当它是由Windows动态加载器加载。更多关于重新筑底可以在此博士发现多布斯文章。在Windows移植可执行格式(PE)的信息可以在此维基文章被发现。 Cygwin和MinGW的不必担心preserving EBX
/ RBX
目标32位code时因为他们的平台上PIC已经由操作系统,等重新立足的工具和连接处理。
This is because under Windows all 32bit code can be re-based when it is loaded by the Windows dynamic loader. More about re-basing can be found in this Dr. Dobbs article. Information on the windows Portable Executable format (PE) can be found in this Wiki article. Cygwin and MinGW don't need to worry about preserving ebx
/rbx
when targeting 32bit code because on their platforms PIC is already handled by the OS, other re-basing tools, and the linker.
这篇关于产生code不匹配,支持扩展ASM预期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!