如何检查vDSP功能是否在霓虹灯上运行标量或SIMD [英] how to check if vDSP function runs scalar or SIMD on neon

查看:132
本文介绍了如何检查vDSP功能是否在霓虹灯上运行标量或SIMD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用vDSP框架中的某些功能,尤其是vDSP_conv,我想知道是否有任何方法可以检查该功能是调用标量模式还是在霓虹处理器上处理过SIMD。

文档提到了必须满足power-pc-architecture的某些标准,或者调用了标量模式。现在我既不知道这些条件是否也适用于iphone,也不知道如何检查我的函数是否调用标量模式或在霓虹灯上正常运行。

Im currently using some functions from the vDSP framework, especially the vDSP_conv and I'm wondering if there is any way to check if the function invokes scalar mode or is processed SIMD on the neon processor.
The documentation of the function mentions some criteria for power-pc-architecture which have to be fulfilled or scalar mode is invoked. Now i neither know if these criteria apply for the iphone as well nor how to check if my function invokes scalar mode or runs properly on neon.

是否可以检查

谢谢!

is there a way to check this?
thanks!

推荐答案

vDSP_conv实现中使用了NEON代码。

NEON code is used in the vDSP_conv implementation. It is used in some cases and not in others.

我们(产生vDSP的Vector and Numerics Group)没有发布有关使用NEON的函数的标准,部分原因是有很多复杂的因素:每个调用的细节(步长,长度和多个参数的对齐方式),执行代码的处理器模型以及软件版本。

We (the Vector and Numerics Group, which produces vDSP) are not publishing criteria about which functions use NEON in part because there are a number of complicating factors: specifics about each call (strides, lengths, and alignments of multiple parameters), processor model that the code is executed on, and software version.

如果您有关于特定案例的问题,我也许可以调查。

If you have a question about a specific case, I may be able to investigate it.

您是出于好奇心问道,还是表现不是您所期望的?通常,潜在的问题是实现的执行速度以及是否可能更好。 SIMD可能是其中一部分的代理,但并非实际目标。

Are you asking out of curiosity, or is the performance not what you expected? Generally, the underlying concern is how fast an implementation performs and whether it could be better. SIMD may be a proxy for some of that, but it is not the actual goal.

已更新,以解决以下评论:

Updated to address a comment below:

调查最近的iOS的源代码,看起来相关时需要获得SIMD代码的所有事情是使用NEON在处理器上执行并将所有步幅设置为1。但是,该代码专门用于对齐会提示地址是否对齐,因此,如果将信号,滤波器和输出地址安排为16字节的倍数,则可以在某些处理器型号上获得更好的性能。如果可以,请使用8的倍数作为过滤器元素的数量,但是4的倍数也很好。

Surveying the source code for recent iOS, it looks like all you need to get SIMD code when doing correlation is to execute on a processor with NEON and set all the strides to 1. However, the code is specialized to use alignment hints if addresses are aligned, so you may get better performance on certain processor models if you arrange for the signal, filter, and output addresses to be multiples of 16 bytes. If you can, use multiples of eight for the number of filter elements, but multiples of four are good too.

不幸的是,代码不是O(n•log( n));它使用直接算术而不是FFT实现,因此为O(n 2 )。通常,它设计为适合直接算术的较短长度。如果相关的FFT算法对您有帮助,请在 https://bugreport.apple.com 上提出功能请求

Unfortunately, the code is not O(n•log(n)); it uses direct arithmetic and not an FFT implementation, so it is O(n2). Generally, it is designed for shorter lengths, where direct arithmetic is suitable. If an FFT algorithm for correlation would help you, please file a feature request at https://bugreport.apple.com.

不管使用哪种算法,如果您希望相同的信息与长度无关,则较短的长度并不是更好。这是因为,如果处理较短的长度,则必须以各种组合处理更多的长度,以获得相同的信息。我希望设计将弄清楚所需的长度,以便相关性产生所需的信息,然后在不细分的情况下使用该长度。

Regardless of the algorithm used, shorter lengths are not better if you want the same information independent of length. That is because, if you process shorter lengths, you would have to process more of them, in various combinations, to get the same information. I expect the design would be to figure out what length you need so that the correlation produces the information you require, then use that length without subdividing it.

这篇关于如何检查vDSP功能是否在霓虹灯上运行标量或SIMD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆