如何从内联汇编访问 C 结构/变量? [英] How to access C struct/variables from inline asm?

查看:42
本文介绍了如何从内联汇编访问 C 结构/变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下代码:

    int bn_div(bn_t *bn1, bn_t *bn2, bn_t *bnr)
  {
    uint32 q, m;        /* Division Result */
    uint32 i;           /* Loop Counter */
    uint32 j;           /* Loop Counter */

    /* Check Input */
    if (bn1 == NULL) return(EFAULT);
    if (bn1->dat == NULL) return(EFAULT);
    if (bn2 == NULL) return(EFAULT);
    if (bn2->dat == NULL) return(EFAULT);
    if (bnr == NULL) return(EFAULT);
    if (bnr->dat == NULL) return(EFAULT);


    #if defined(__i386__) || defined(__amd64__)
    __asm__ (".intel_syntax noprefix");
    __asm__ ("pushl %eax");
    __asm__ ("pushl %edx");
    __asm__ ("pushf");
    __asm__ ("movl %eax, (bn1->dat[i])");
    __asm__ ("xorl %edx, %edx");
    __asm__ ("divl (bn2->dat[j])");
    __asm__ ("movl (q), %eax");
    __asm__ ("movl (m), %edx");
    __asm__ ("popf");
    __asm__ ("popl %edx");
    __asm__ ("popl %eax");
    #else
    q = bn->dat[i] / bn->dat[j];
    m = bn->dat[i] % bn->dat[j];
    #endif
    /* Return */
    return(0);
  }

数据类型 uint32 基本上是一个 unsigned long int 或一个 uint32_t 无符号 32 位整数.类型 bnint 是无符号短整型 (uint16_t) 或 uint32_t,具体取决于 64 位数据类型是否可用.如果 64 位可用,则 bnint 为 uint32,否则为 uint16.这样做是为了捕获代码其他部分的进位/溢出.结构bn_t定义如下:

The data types uint32 is basically an unsigned long int or a uint32_t unsigned 32-bit integer. The type bnint is either a unsigned short int (uint16_t) or a uint32_t depending on if 64-bit data types are available or not. If 64-bit is available, then bnint is a uint32, otherwise it's a uint16. This was done in order to capture carry/overflow in other parts of the code. The structure bn_t is defined as follows:

typedef struct bn_data_t bn_t;
struct bn_data_t
  {
    uint32 sz1;         /* Bit Size */
    uint32 sz8;         /* Byte Size */
    uint32 szw;         /* Word Count */
    bnint *dat;         /* Data Array */
    uint32 flags;       /* Operational Flags */
  };

该函数从我的源代码的第 300 行开始.因此,当我尝试编译/制作它时,出现以下错误:

The function starts on line 300 in my source code. So when I try to compile/make it, I get the following errors:

system:/home/user/c/m3/bn 1036 $$$ ->make
clang -I. -I/home/user/c/m3/bn/.. -I/home/user/c/m3/bn/../include  -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes  -Wmissing-prototypes -Wnested-externs -Wwrite-strings -Wfloat-equal  -Winline -Wunknown-pragmas -Wundef -Wendif-labels  -c /home/user/c/m3/bn/bn.c
/home/user/c/m3/bn/bn.c:302:12: warning: unused variable 'q' [-Wunused-variable]
    uint32 q, m;        /* Division Result */
           ^
/home/user/c/m3/bn/bn.c:302:15: warning: unused variable 'm' [-Wunused-variable]
    uint32 q, m;        /* Division Result */
              ^
/home/user/c/m3/bn/bn.c:303:12: warning: unused variable 'i' [-Wunused-variable]
    uint32 i;           /* Loop Counter */
           ^
/home/user/c/m3/bn/bn.c:304:12: warning: unused variable 'j' [-Wunused-variable]
    uint32 j;           /* Loop Counter */
           ^
/home/user/c/m3/bn/bn.c:320:14: error: unknown token in expression
    __asm__ ("movl %eax, (bn1->dat[i])");
             ^
<inline asm>:1:18: note: instantiated into assembly here
        movl %eax, (bn1->dat[i])
                        ^
/home/user/c/m3/bn/bn.c:322:14: error: unknown token in expression
    __asm__ ("divl (bn2->dat[j])");
             ^
<inline asm>:1:12: note: instantiated into assembly here
        divl (bn2->dat[j])
                  ^
4 warnings and 2 errors generated.
*** [bn.o] Error code 1

Stop in /home/user/c/m3/bn.
system:/home/user/c/m3/bn 1037 $$$ ->

我所知道的:

我认为自己非常精通 x86 汇编程序(从我上面编写的代码中可以看出).然而,我上一次混合高级语言和汇编程序是在大约 15-20 年前使用 Borland Pascal 编写游戏图形驱动程序(Windows 95 之前的时代).我熟悉的是 Intel 语法.

I consider myself to be fairly well versed in x86 assembler (as evidenced from the code that I wrote above). However, the last time that I mixed a high level language and assembler was using Borland Pascal about 15-20 years ago when writing graphics drivers for games (pre-Windows 95 era). My familiarity is with Intel syntax.

我不知道的:

如何从 asm 访问 bn_t 的成员(尤其是 *dat)?由于 *dat 是指向 uint32 的指针,因此我将元素作为数组访问(例如 bn1->dat[i]).

How do I access members of bn_t (especially *dat) from asm? Since *dat is a pointer to uint32, I am accessing the elements as an array (eg. bn1->dat[i]).

如何访问在堆栈上声明的局部变量?

How do I access local variables that are declared on the stack?

我正在使用 push/pop 将损坏的寄存器恢复到它们以前的值,以免扰乱编译器.但是,我还需要在局部变量中包含 volatile 关键字吗?

I am using push/pop to restore clobbered registers to their previous values so as to not upset the compiler. However, do I also need to include the volatile keyword on the local variables as well?

或者,有没有我不知道的更好的方法?由于调用开销,我不想将其放在单独的函数调用中,因为此函数对性能至关重要.

Or, is there a better way that I am not aware of? I don't want to put this in a separate function call because of the calling overhead as this function is performance critical.

附加:

现在,我刚刚开始编写这个函数,所以还没有完成.缺少循环和其他此类支持/粘合代码.但是,主要要点是访问局部变量/结构元素.

Right now, I'm just starting to write this function so it is no where complete. There are missing loops and other such support/glue code. But, the main gist is accessing local variables/structure elements.

编辑 1:

我使用的语法似乎是 clang 唯一支持的语法.我尝试了以下代码,clang 给了我各种错误:

The syntax that I am using seems to be the only one that clang supports. I tried the following code and clang gave me all sorts of errors:

__asm__ ("pushl %%eax",
    "pushl %%edx",
    "pushf",
    "movl (bn1->dat[i]), %%eax",
    "xorl %%edx, %%edx",
    "divl ($0x0c + bn2 + j)",
    "movl %%eax, (q)",
    "movl %%edx, (m)",
    "popf",
    "popl %%edx",
    "popl %%eax"
    );

它要我在第一行放一个右括号,替换逗号.我改用 %% 而不是 % 因为我在某处读到内联汇编需要 %% 来表示 CPU 寄存器,而 clang 告诉我我使用了无效的转义序列.

It wants me to put a closing parenthesis on the first line, replacing the comma. I switched to using %% instead of % because I read somewhere that inline assembly requires %% to denote CPU registers, and clang was telling me that I was using an invalid escape sequence.

推荐答案

如果你只需要32b/32b => 32bit除法,让编译器同时使用div的输出,其中 gcc、clang 和 icc 都做得很好,正如您在 Godbolt编译器资源管理器:

If you only need 32b / 32b => 32bit division, let the compiler use both outputs of div, which gcc, clang and icc all do just fine, as you can see on the Godbolt compiler explorer:

uint32_t q = bn1->dat[i] / bn2->dat[j];
uint32_t m = bn1->dat[i] % bn2->dat[j];

编译器非常擅长CSE将其转化为一个div.只要确保您不要将除法结果存储在 gcc 无法证明不会影响余数输入的地方.

Compilers are quite good at CSEing that into one div. Just make sure you don't store the division result somewhere that gcc can't prove won't affect the input of the remainder.

例如*m = dat[i]/dat[j] 可能会重叠(别名)dat[i]dat[j],所以 gcc将不得不重新加载操作数并为 % 操作重做 div.有关坏/好示例,请参阅 Godbolt 链接.

e.g. *m = dat[i] / dat[j] might overlap (alias) dat[i] or dat[j], so gcc would have to reload the operands and redo the div for the % operation. See the godbolt link for bad/good examples.

对 32bit/32bit = 32bit div 使用内联 asm 不会给你带来任何好处,而且实际上用 clang 制作更糟糕的代码(参见 Godbolt 链接).

Using inline asm for 32bit / 32bit = 32bit div doesn't gain you anything, and actually makes worse code with clang (see the godbolt link).

如果你需要 64bit/32bit = 32bit,你可能需要 asm,不过,如果没有内置的编译器.(GNU C 没有,AFAICT).C 中显而易见的方法(将操作数转换为 uint64_t)生成对 64bit/64bit = 64bit libgcc 函数的调用,该函数具有分支和多个 div 指令.gcc 不擅长证明结果适合 32 位,因此单个 div 指令不会导致 #DE.

If you need 64bit / 32bit = 32bit, you probably need asm, though, if there isn't a compiler built-in for it. (GNU C doesn't have one, AFAICT). The obvious way in C (casting operands to uint64_t) generates a call to a 64bit/64bit = 64bit libgcc function, which has branches and multiple div instructions. gcc isn't good at proving the result will fit in 32bits, so a single div instruction don't cause a #DE.

对于许多其他指令,您可以避免使用 诸如popcount之类的内置函数.使用 -mpopcnt,它会编译为 popcnt 指令(并说明 Intel CPU 对输出操作数的错误依赖性.)否则,它会编译为 libgcc 函数打电话.

For a lot of other instructions, you can avoid writing inline asm a lot of the time with builtin functions for things like popcount. With -mpopcnt, it compiles to the popcnt instruction (and accounts for the false-dependency on the output operand that Intel CPUs have.) Without, it compiles to a libgcc function call.

总是更喜欢内置函数,或者编译成好的 asm 的纯 C,这样编译器就知道代码做了什么.当内联使某些参数在编译时已知时,纯 C 可以优化或简化,但是使用内联 asm 的代码只会将常量加载到寄存器中,并在运行时执行 div.内联 asm 也会在相同数据的类似计算之间击败 CSE,当然不能自动矢量化.

Always prefer builtins, or pure C that compiles to good asm, so the compiler knows what the code does. When inlining makes some of the arguments known at compile-time, pure C can be optimized away or simplified, but code using inline asm will just load constants into registers and do a div at run-time. Inline asm also defeats CSE between similar computations on the same data, and of course can't auto-vectorize.

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html 解释了如何告诉汇编器你想要寄存器中的哪些变量,以及输出是什么.

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html explains how to tell the assembler which variables you want in registers, and what the outputs are.

您可以使用类似 Intel/MASM 的语法和助记符,如果您愿意,也可以使用非 % 寄存器名称,最好通过使用 -masm=intel 编译.AT&T 语法错误(fsubfsubr 助记符颠倒了) 可能仍然存在于 intel-syntax 模式中;我忘记了.

You can use Intel/MASM-like syntax and mnemonics, and non-% register names if you like, preferably by compiling with -masm=intel. The AT&T syntax bug (fsub and fsubr mnemonics are reversed) might still be present in intel-syntax mode; I forget.

大多数使用 GNU C 内联汇编的软件项目仅使用 AT&T 语法.

Most software projects that use GNU C inline asm use AT&T syntax only.

另见这个答案的底部 了解更多 GNU C 内联 asm 信息,以及 标记维基.

See also the bottom of this answer for more GNU C inline asm info, and the x86 tag wiki.

一个 asm 语句采用 one 字符串 arg 和 3 组约束.使其成为多行的最简单方法是使每个 asm 行成为以 结尾的单独字符串,并让编译器隐式连接它们.

An asm statement takes one string arg, and 3 sets of constraints. The easiest way to make it multi-line is by making each asm line a separate string ending with , and let the compiler implicitly concatenate them.

另外,你告诉编译器你想要的东西在哪个寄存器中.然后如果变量已经在寄存器中,编译器就不必溢出它们并让你加载和存储它们.这样做真的会射中自己的脚.评论中链接的 教程 Brett Hale 希望涵盖所有这些内容.

Also, you tell the compiler which registers you want stuff in. Then if variables are already in registers, the compiler doesn't have to spill them and have you load and store them. Doing that would really shoot yourself in the foot. The tutorial Brett Hale linked in comments hopefully covers all this.

您可以在 t,directives!

You can see the compiler asm output for this on godbolt.

uint32_t q, m;  // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code.

asm ("divl %[bn2dat_j]
"
      : "=a" (q), "=d" (m) // results are in eax, edx registers
      : "d" (0),           // zero edx for us, please
        "a" (bn1->dat[i]), // "a" means EAX / RAX
        [bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient
      : // no register clobbers, and we don't read/write "memory" other than operands
    );

"divl %4" 也可以,但是当您添加更多输入/输出约束时,命名输入/输出不会更改名称.

"divl %4" would have worked too, but named inputs/outputs don't change name when you add more input/output constraints.

这篇关于如何从内联汇编访问 C 结构/变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆