在内联 GCC 程序集中使用 C 数组 [英] Using C arrays in inline GCC assembly

查看:20
本文介绍了在内联 GCC 程序集中使用 C 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 GCC 编译器(Mac 上的 Xcode)在汇编中使用传递给 C 函数的两个数组,如下所示.我已经很多年没有编写程序集了,所以我相信这是一个很容易解决的问题.

I'd like to use two array passed into a C function as below in assembly using a GCC compiler (Xcode on Mac). It has been many years since I've written assembly, so I'm sure this is an easy fix.

这里的第一行没问题.第二行失败.我正在尝试执行以下操作,A[0] += x[0]*x[0],并且我想对具有不同索引的数组中的许多元素执行此操作.我这里只展示一个.如何在程序集块中使用读/写数组?

The first line here is fine. The second line fails. I'm trying to do the following, A[0] += x[0]*x[0], and I want to do this for many elements in the array with different indices. I'm only showing one here. How do I use a read/write array in the assembly block?

如果有更好的方法来做到这一点,我愿意倾听.

And if there is a better approach to do this, I'm open ears.

inline void ArrayOperation(float A[36], const float x[8])
{
    float tmp;

    __asm__ ( "fld %1; fld %2; fmul; fstp %0;" : "=r" (tmp) : "r" (x[0]), "r" (x[0]) );
    __asm__ ( "fld %1; fld %2; fadd; fstp %0;" : "=r" (A[0]) : "r" (A[0]), "r" (tmp) );

    // ...
}

推荐答案

代码失败的原因不是数组,而是 fld 和 fst 指令的工作方式.这是您想要的代码:

The reason why the code fails is not because of arrays, but because of the way fld and fst instructions work. This is the code you want:

float tmp;

__asm__ ( "flds %1; fld %%st(0); fmulp; " : "=t" (tmp) : "m" (x[0]) );
__asm__ ( "flds %1; fadds %2;" : "=t" (A[0]) : "m" (A[0]), "m" (tmp) );

fldfst 指令需要一个内存操作数.此外,您需要指定是否要加载 float (flds)、double (fldl) 或 long double (fldt).至于输出操作数,我只使用约束=t,它只是告诉编译器结果在寄存器堆栈的顶部,即ST(0).

fld and fst instructions need a memory operand. Also, you need to specify if you want to load float (flds), double (fldl) or long double (fldt). As for the output operands, I just use a constraint =t, which simply tells the compiler that the result is on the top of the register stack, i.e. ST(0).

算术运算要么没有操作数 (fmulp),要么只有一个内存操作数(但您必须再次指定大小、fmuls、fadds 等).

Arithmetic operations have either no operands (fmulp), or a single memory operand (but then you have to specify the size again, fmuls, fadds etc.).

您可以阅读有关内联汇编器的更多信息,一般的 GNU 汇编程序,并参见 英特尔® 64 和 IA-32 架构软件开发人员手册.

You can read more about inline assembler, GNU Assembler in general, and see the Intel® 64 and IA-32 Architectures Software Developer’s Manual.

当然最好去掉临时变量:

Of course, it is best to get rid of the temporary variable:

   __asm__ ( "flds %1; fld %%st(0); fmulp; fadds %2;" : "=t" (A[0]) : "m" (x[0]), "m" (A[0]));

虽然如果您追求的是性能改进,则不需要使用汇编程序.GCC 完全有能力生成此代码.但是您可能会考虑使用向量 SSE 指令和其他简单的优化技术,例如打破计算中的依赖链,请参阅 AgnerFog的优化手册

Though if a performance improvement is what you're after, you don't need to use assembler. GCC is completely capable of producing this code. But you might consider using vector SSE instructions and other simple optimization techniques, such as breaking the dependency chains in the calculations, see Agner Fog's optimization manuals

这篇关于在内联 GCC 程序集中使用 C 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆