利用SSE和其他CPU扩展 [英] Taking advantage of SSE and other CPU extensions

查看:138
本文介绍了利用SSE和其他CPU扩展的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Theres是我的代码库中的几个地方,其中对于大型数据集,重复相同的操作非常多次。在某些情况下,处理这些操作需要相当长的时间。



我相信使用SSE来实现这些循环应该显着提高它们的性能,特别是在执行许多操作时同样的数据集,所以一旦数据被初始读入缓存,应该不会有任何缓存未命中停止它。




  • 是否有编译器和操作系统独立的方式编写代码以利用的SSE指令?我喜欢VC ++内在函数,其中包括SSE操作,但我还没有找到任何交叉编译解决方案。


  • 我仍然需要支持一些CPU没有或有限的SSE支持(例如Intel Celeron)。有没有一些方法可以避免不必要的不​​同版本的程序,比如有某种运行时链接器链接在基本或SSE优化的代码基于CPU运行它时,进程启动?


  • < $ b

解决方案

对于第二点,有几种解决方案, p>


  • 纯旧的C函数指针

  • 动态链接(通常依赖于C函数指针)

  • 如果您使用的是C ++,那么表示支持不同架构和使用虚拟函数的不同类可以非常有用。



请注意,因为你将依赖于间接函数调用,抽象不同操作的函数通常需要表示更高级别的功能,或者你可能会失去从优化的指令



这里是一个使用函数指针的例子:

  typedef int(* scale_func_ptr)(int scalar,int * pData,int count); 


int non_sse_scale(int scalar,int * pData,int count)
{
//做任何工作需要,没有SSE,所以它会工作较旧的CPU

return 0;
}

int sse_scale(int标量,在pData中,int count)
{
//等效代码,但使用SSE

return 0;
}


//初始化时

scale_func_ptr scale_func = non_sse_scale;

if(useSSE){
scale_func = sse_scale;
}


//现在,当你想要做的工作:

scale_func(12,theData_ptr,512); //这将调用定制为SSE的例程
//如果CPU支持它,否则调用非SSE
//版本的函数


Theres are couple of places in my code base where the same operation is repeated a very large number of times for a large data set. In some cases it's taking a considerable time to process these.

I believe that using SSE to implement these loops should improve their performance significantly, especially where many operations are carried out on the same set of data, so once the data is read into the cache initially, there shouldn't be any cache misses to stall it. However I'm not sure about going about this.

  • Is there a compiler and OS independent way writing the code to take advantage of SSE instructions? I like the VC++ intrinsics, which include SSE operations, but I haven't found any cross compiler solutions.

  • I still need to support some CPU's that either have no or limited SSE support (eg Intel Celeron). Is there some way to avoid having to make different versions of the program, like having some kind of "run time linker" that links in either the basic or SSE optimised code based on the CPU running it when the process is started?

  • What about other CPU extensions, looking at the instruction sets of various Intel and AMD CPU's shows there are a few of them?

解决方案

For your second point there are several solutions as long as you can separate out the differences into different functions:

  • plain old C function pointers
  • dynamic linking (which generally relies on C function pointers)
  • if you're using C++, having different classes that represent the support for different architectures and using virtual functions can help immensely with this.

Note that because you'd be relying on indirect function calls, the functions that abstract the different operations generally need to represent somewhat higher level functionality or you may lose whatever gains you get from the optimized instruction in the call overhead (in other words don't abstract the individual SSE operations - abstract the work you're doing).

Here's an example using function pointers:

typedef int (*scale_func_ptr)( int scalar, int* pData, int count);


int non_sse_scale( int scalar, int* pData, int count)
{
    // do whatever work needs done, without SSE so it'll work on older CPUs

    return 0;
}

int sse_scale( int scalar, in pData, int count)
{
    // equivalent code, but uses SSE

    return 0;
}


// at initialization

scale_func_ptr scale_func = non_sse_scale;

if (useSSE) {
    scale_func = sse_scale;
}


// now, when you want to do the work:

scale_func( 12, theData_ptr, 512);  // this will call the routine that tailored to SSE 
                                    // if the CPU supports it, otherwise calls the non-SSE
                                    // version of the function

这篇关于利用SSE和其他CPU扩展的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆