使用内联汇编GCC C数组 [英] Using C arrays in inline GCC assembly

查看:506
本文介绍了使用内联汇编GCC C数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用两个数组传递到下面的C函数使用GCC编译器(在Mac X code)组装。它已经很多年,因为我已经写了组装,所以我敢肯定这是一个容易解决。

下面第一行是罚款。第二行失败。我试图做到以下几点,A [0] + = X [0] * X [0],我想与不同指数数组中的许多元素做到这一点。我只显示一个在这里。我如何使用在组装块的读/写阵列?

和是否有更好的方法来做到这一点,我是开放的耳朵。

 内嵌无效ArrayOperation(float一[36],常量浮动X [8])
{
    浮TMP;    __asm​​__(FLD%1; FLD%2; FMUL; FSTP%0;:= R(TMP):R(X [0]),R(X [0]));
    __asm​​__(FLD%1; FLD%2; FADD; FSTP%0;:= R(A [0]):R(A [0]),R(TMP));    // ...
}


解决方案

为什么code失败,是因为数组没有,但究其原因,因为FLD的方式和FST指令工作。这是你想要的code:

 浮动tmp目录;__asm​​__(FLDS%1; %% FLD ST(0); fmulp;:= T(TMP)M(X [0]));
__asm​​__(FLDS%1; fadds%2;:= T(A [0]):M(A [0]),M(TMP));

FLD FST 指令需要内存操作数。此外,您还需要指定,如果要加载浮动(FLDS),双(fldl)或长双(fldt)。对于输出操作数,我只是用约束 = T ,它只是告诉编译器的结果是在寄存器堆栈的顶部,即ST(0)。

算术运算或者没有操作数(fmulp),或单个存储器的操作数(但你必须再次指定大小,fmuls,fadds等)。

您可以阅读更多有关内联汇编 GNU汇编一般,并请参阅< A HREF =htt​​p://www.intel.com/相对=nofollow>英特尔®64和IA-32架构软件开发人员手册。

当然,最好是摆脱临时变量:

  __asm​​__(FLDS%1; %% FLD ST(0); fmulp; fadds%2;:= T(A [0]):M( X [0]),M(A [0]));

不过,如果性能改进是你以后,你并不需要使用汇编。 GCC是完全有能力生产这种code的。但是,你可能会考虑使用矢量SSE指令和其他简单的优化技术,如打破了依赖连锁计算,见瓦格纳雾的优化手册

I'd like to use two array passed into a C function as below in assembly using a GCC compiler (Xcode on Mac). It has been many years since I've written assembly, so I'm sure this is an easy fix.

The first line here is fine. The second line fails. I'm trying to do the following, A[0] += x[0]*x[0], and I want to do this for many elements in the array with different indices. I'm only showing one here. How do I use a read/write array in the assembly block?

And if there is a better approach to do this, I'm open ears.

inline void ArrayOperation(float A[36], const float x[8])
{
    float tmp;

    __asm__ ( "fld %1; fld %2; fmul; fstp %0;" : "=r" (tmp) : "r" (x[0]), "r" (x[0]) );
    __asm__ ( "fld %1; fld %2; fadd; fstp %0;" : "=r" (A[0]) : "r" (A[0]), "r" (tmp) );

    // ...
}

解决方案

The reason why the code fails is not because of arrays, but because of the way fld and fst instructions work. This is the code you want:

float tmp;

__asm__ ( "flds %1; fld %%st(0); fmulp; " : "=t" (tmp) : "m" (x[0]) );
__asm__ ( "flds %1; fadds %2;" : "=t" (A[0]) : "m" (A[0]), "m" (tmp) );

fld and fst instructions need a memory operand. Also, you need to specify if you want to load float (flds), double (fldl) or long double (fldt). As for the output operands, I just use a constraint =t, which simply tells the compiler that the result is on the top of the register stack, i.e. ST(0).

Arithmetic operations have either no operands (fmulp), or a single memory operand (but then you have to specify the size again, fmuls, fadds etc.).

You can read more about inline assembler, GNU Assembler in general, and see the Intel® 64 and IA-32 Architectures Software Developer’s Manual.

Of course, it is best to get rid of the temporary variable:

   __asm__ ( "flds %1; fld %%st(0); fmulp; fadds %2;" : "=t" (A[0]) : "m" (x[0]), "m" (A[0]));

Though if a performance improvement is what you're after, you don't need to use assembler. GCC is completely capable of producing this code. But you might consider using vector SSE instructions and other simple optimization techniques, such as breaking the dependency chains in the calculations, see Agner Fog's optimization manuals

这篇关于使用内联汇编GCC C数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆