为什么我收到这些汇编错误? [英] Why am I getting these assembler errors?

查看:126
本文介绍了为什么我收到这些汇编错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的函数,需要从浮点数转换为整数。没有这个转换功能在我的机器上需要11-12ns /循环。转换需要约400 ns /循环。



在一些读后,我发现了一种方法,使用一点内联汇编来加速转换。我的函数的第一个迭代如下:

  inline int FISTToInt(float f)
{
int i;
asm(fld%1;
fistp%0;
:= r(i)
:r(f)

);
return i;
}

当我编译时遇到以下错误:

  src / calcRunner.cpp:汇编消息:
src / calcRunner.cpp:43:错误:操作数类型`fld'的不匹配
src / calcRunner.cpp:43:错误:`fistp'的操作数类型不匹配

有点想法提供的答案,我忘了指令后缀,所以我改变了函数如下:

  inline int FISTToInt(float f)
{
int i;
asm(flds%1;
fistps%0;
:= r(i)
:r(f)

);
return i;
}

但这并没有解决问题, >

  src / calcRunner.cpp:汇编程序消息:
src / calcRunner.cpp:43:错误:`fld'的无效指令后缀
src / calcRunner.cpp:43:错误:`fistp'无效的指令后缀

这是怎么回事?

解决方案

b $ b

  int trunk(float x)
{
int i;
__asm__ __volatile __(
flds%1 \\\

fistpl%0 \\\

:= m(i):m );
return i;
}

但是,实际上使用x87模式,并且它的速度更快,因为它不加载和存储确定舍入的FP控制字。

简单基准:

  #include< stdio.h> 
#include< stdlib.h>

int trunk(float x)
{
int i;
__asm__ __volatile __(
flds%1 \\\

fistpl%0 \\\

:= m(i):m );
return i;
}


int trunk2(float x)
{
return(int)x;
}

inline long long rdtsc()
{
unsigned long a,d;
__asm volatile(rdtsc:= a(a),= d(d)::ebx,ecx);
return a | ((长长度)d <32)。
}


int main()
{
float f [1000];
for(int i = 0; i <1000; i ++)
{
f [i] = rand()/(i + 1)
}
long long t = rdtsc();
int sum = 0;
for(int i = 0; i <1000; i ++)
{
sum = trunk(f [i]);
}
t = rdtsc() - t;
printf(Sum =%d time =%ld\\\
,sum,t);

t = rdtsc();
sum = 0;
for(int i = 0; i <1000; i ++)
{
sum = trunk2(f [i]);
}
t = rdtsc() - t;
printf(Sum =%d time =%ld\\\
,sum,t);

return 0;
}

编译gcc -O2 -m64 -std = c99,结果:

  Sum = 1143565 time = 30196 
Sum = 1143565 time = 15946

在32位编译(gcc -O2 -m32 -std = c99):

  Sum = 1143565 time = 29847 
Sum = 1143565 time = 107618

换句话说,它慢得多。但是,如果我们启用sse2(和remove: gcc -m32 -msse2 -mfpmath = sse -O2 ,它会更好:

  Sum = 1143565 time = 30277 
Sum = 1143565 time = 11789

请注意,第一个数字是你的解决方案,其中第二个结果是编译器的解决方案。



显然,你的系统,以确保结果确实匹配。



编辑:发现我应该实际添加循环中的数字,而不是只是走过他们在 sum 中,我得到以下clang结果:



clang -m32 -msse2 -mfpmath = sse -O2 floatbm.c -std = c99

  Sum = 625049287 time = 30290 
Sum = 625049287 time = 3663

解释为什么它在让编译器做的工作是,Clang 3.5正在生成一个展开的循环,适当的SSE simd的第二个循环 - 它不能这样做的第一个循环,因此每次迭代是1浮点值。



为了显示gcc仍然给出相同的结果,我重新运行gcc:

  Sum = 625049287 time = 31612 
Sum = 625049287 time = 15007

只是与之前的区别是,我使用 sum + = trunk(f [i]); 而不是 sum = ...


I have a big function that needs to convert from floats to integers at a point. Without this conversion the function takes 11-12 ns/loop on my machine. With the conversion it takes ~ 400 ns/loop.

After some reading I found a way to speed the conversion up using a bit of inline assembly. The first iteration of my function was as follows:

inline int FISTToInt (float f)
{
    int i;
    asm("fld %1;"
        "fistp %0;"
        :"=r" ( i )
        :"r" ( f )
        :
    );
    return i;
}

when I compiled that I got the following errors:

src/calcRunner.cpp: Assembler messages:
src/calcRunner.cpp:43: Error: operand type mismatch for `fld'
src/calcRunner.cpp:43: Error: operand type mismatch for `fistp'

A bit of thought supplied the answer, I forgot the instruction suffixes, so I changed the function to be as follows:

inline int FISTToInt (float f)
{
    int i;
    asm("flds %1;"
        "fistps %0;"
        :"=r" ( i )
        :"r" ( f )
        :
    );
    return i;
}

However this did not fix the problem, instead I get this:

src/calcRunner.cpp: Assembler messages:
src/calcRunner.cpp:43: Error: invalid instruction suffix for `fld'
src/calcRunner.cpp:43: Error: invalid instruction suffix for `fistp'

What is going on?

解决方案

This works:

int trunk(float x)
{
    int i;
    __asm__ __volatile__(
    "    flds   %1\n"
    "    fistpl %0\n"
    : "=m"(i) : "m"(x));
    return i; 
}

However, it's only (possibly) faster than the compiler generated code if you are actually using x87 mode, and it's faster because it's not loading and storing the FP control word that determines the rounding. I will get back with a couple of benchmarks...

Simple benchmark:

#include <stdio.h>
#include <stdlib.h>

int trunk(float x)
{
    int i;
    __asm__ __volatile__(
    "    flds   %1\n"
    "    fistpl %0\n"
    : "=m"(i) : "m"(x));
    return i; 
}


int trunk2(float x)
{
    return (int)x;
}

inline long long rdtsc()
{
    unsigned long a, d;
    __asm volatile ("rdtsc" : "=a" (a), "=d" (d) : : "ebx", "ecx"); 
    return a | ((long long)d << 32);
}


int main()
{
    float f[1000];
    for(int i = 0; i < 1000; i++)
    {
    f[i] = rand() / (i+1); 
    }
    long long t = rdtsc();
    int sum = 0;
    for(int i = 0; i < 1000; i++)
    {
    sum = trunk(f[i]);
    }
    t = rdtsc() - t;
    printf("Sum=%d time=%ld\n", sum, t);

    t = rdtsc();
    sum = 0;
    for(int i = 0; i < 1000; i++)
    {
    sum = trunk2(f[i]);
    }
    t = rdtsc() - t;
    printf("Sum=%d time=%ld\n", sum, t);

    return 0;
}

Compiled with gcc -O2 -m64 -std=c99, it produces the following result:

Sum=1143565 time=30196
Sum=1143565 time=15946

In a 32-bit compile (gcc -O2 -m32 -std=c99):

Sum=1143565 time=29847
Sum=1143565 time=107618

In other words, it's a lot slower. However, if we enable sse2 (and remove: gcc -m32 -msse2 -mfpmath=sse -O2, it gets much better:

Sum=1143565 time=30277
Sum=1143565 time=11789

Note that the first number is "your solution", where the second result is the compiler's solution.

Obviously, please do measure on your system, to ensure the results do indeed match up.

Edit: After finding that I should actually add the numbers in the loop, rather than just walk through them putting them in sum, I get the following results for clang:

clang -m32 -msse2 -mfpmath=sse -O2 floatbm.c -std=c99

Sum=625049287 time=30290
Sum=625049287 time=3663

The explanation to why it is so much better in "let the compiler do the job" is that Clang 3.5 is producing an unrolled loop with proper SSE simd for the second loop - it can't do that for the first loop, so each iteration is 1 float value.

Just to show that gcc still gives the same result, I rerun with gcc:

Sum=625049287 time=31612
Sum=625049287 time=15007

Only difference from before is that I use sum += trunk(f[i]); instead of sum = ....

这篇关于为什么我收到这些汇编错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆