_ftol2_sse,有更快的选择吗? [英] _ftol2_sse, are there faster options?

查看:245
本文介绍了_ftol2_sse,有更快的选择吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有代码调用很多

int myNumber = (int)(floatNumber);

总共需要大约10%的CPU时间。虽然我可以离开它,我想知道是否有更快的选项,所以我试图搜索,并偶然发现了

which takes up, in total, around 10% of my CPU time (according to profiler). While I could leave it at that, I wonder if there are faster options, so I tried searching around, and stumbled upon


http://devmaster.net/forums/topic/7804-fast-int-float-conversion-例程/
http://stereopsis.com/FPU.html

我试过实现Real2Int()函数,但它给我错误的结果,并运行较慢。现在我不知道,有更快的实现来将双/浮点值转换为整数,或是SSE2版本,它得到的速度快?

I tried implementing the Real2Int() function given there, but it gives me wrong results, and runs slower. Now I wonder, are there faster implementations to floor double / float values to integers, or is the SSE2 version as fast as it gets? The pages I found date back a bit, so it might just be outdated, and newer STL is faster at this.

当前的实现是:

013B1030  call        _ftol2_sse (13B19A0h)

013B19A0  cmp         dword ptr [___sse2_available (13B3378h)],0  
013B19A7  je          _ftol2 (13B19D6h)  
013B19A9  push        ebp  
013B19AA  mov         ebp,esp  
013B19AC  sub         esp,8  
013B19AF  and         esp,0FFFFFFF8h  
013B19B2  fstp        qword ptr [esp]  
013B19B5  cvttsd2si   eax,mmword ptr [esp]  
013B19BA  leave  
013B19BB  ret  

相关问题我发现:


在x86上将float转换为int的最快方法是什么?

What is the fastest way to convert float to int on x86

由于两者都是旧的,基于ARM,我想知道是否有目前的方法来做到这一点。注意,它表示最好的转换是不会发生的转换,但我需要它,所以这是不可能的。

Since both are old, or are ARM based, I wonder if there are current ways to do this. Note that it says the best conversion is one that doesn't happen, but I need to have it, so that will not be possible.

推荐答案

如果你定位通用的x86硬件,这将是很难击败。运行时不知道目标机器是否具有SSE单元。如果是这样,它可以做x64编译器做的并且内联一个 cvttss2si 操作码。但是由于运行时必须检查SSE单元是否可用,所以剩下的是当前的实现。这是 ftol2_sse 的实现。如果一个SSE单元可用,它还会传递一个x87寄存器中的值,然后传递给一个SSE寄存器。

It's going to be hard to beat that if you are targeting generic x86 hardware. The runtime doesn't know for sure that the target machine has an SSE unit. If it did, it could do what the x64 compiler does and inline a cvttss2si opcode. But since the runtime has to check whether an SSE unit is available, you are left with the current implementation. That's what the implementation of ftol2_sse does. And what's more it passes the value in an x87 register and then transfers it to an SSE register if an SSE unit is available.

你可以告诉x86编译器目标机器具有SSE单位。然后编译器会发出一个简单的 cvttss2si 操作码内联。这将是你能得到的尽快。但是如果你在旧机器上运行代码,那么它会失败。也许你可以提供两个版本,一个用于具有SSE的机器,另一个用于没有。

You could tell the x86 compiler to target machines that have SSE units. Then the compiler would indeed emit a simple cvttss2si opcode inline. That's going to be as fast as you can get. But if you run the code on an older machine then it will fail. Perhaps you could supply two versions, one for machines with SSE, and one for those without.

这不会让你这么多。它只是为了避免在你实际到达执行工作的 cvttss2si 操作码之前发生的 ftol2_sse 的所有开销。

That's not going to gain you all that much. It's just going to avoid all the overhead of ftol2_sse that happens before you actually reach the cvttss2si opcode that does the work.

要从IDE更改编译器设置,请使用项目>属性>配置属性> C / C ++>代码生成>启用增强指令集。在命令行上是/ arch:SSE或/ arch:SSE2。

To change the compiler settings from the IDE, use Project > Properties > Configuration Properties > C/C++ > Code Generation > Enable Enhanced Instruction Set. On the command line it is /arch:SSE or /arch:SSE2.

这篇关于_ftol2_sse,有更快的选择吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆