iOS 4使用4x4矩阵加速Cblas [英] iOS 4 Accelerate Cblas with 4x4 matrices

查看：190 发布时间：2018/9/21 19:37:22 iphone ios blas

本文介绍了iOS 4使用4x4矩阵加速Cblas的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在研究在iOS 4中提供的Accelerate框架。具体来说，我尝试在C中的线性代数库中使用Cblas例程。现在我无法使用这些函数在非常基本的惯例中给我任何性能提升。具体来说，是4x4矩阵乘法的情况。无论何时我无法使用矩阵的仿射或同质属性，我一直在使用这个例程（删节）：

I’ve been looking into the Accelerate framework that was made available in iOS 4. Specifically, I made some attempts to use the Cblas routines in my linear algebra library in C. Now I can’t get the use of these functions to give me any performance gain over very basic routines. Specifically, the case of 4x4 matrix multiplication. Wherever I couldn’t make use of affine or homogeneous properties of the matrices, I’ve been using this routine (abridged):

float *mat4SetMat4Mult(const float *m0, const float *m1, float *target) {
    target[0] = m0[0] * m1[0] + m0[4] * m1[1] + m0[8] * m1[2] + m0[12] * m1[3];
    target[1] = ...etc...
    ...
    target[15] = m0[3] * m1[12] + m0[7] * m1[13] + m0[11] * m1[14] + m0[15] * m1[15];
    return target;
}

Cblas的等效函数调用是：

The equivalent function call for Cblas is:

cblas_sgemm(CblasColMajor, CblasNoTrans, CblasNoTrans,
   4, 4, 4, 1.f, m0, 4, m1, 4, 0.f, target, 4);

比较两者，使它们运行大量充满随机数的预先计算的矩阵（每个函数）每次都获得完全相同的输入），当使用C clock（）函数计时时，Cblas例程执行速度大约慢4倍。

Comparing the two, by making them run through a large number of pre-computed matrices filled with random numbers (each function gets the exact same input every time), the Cblas routine performs about 4x slower, when timed with the C clock() function.

这对我来说似乎不对而且我感觉我在某处做错了什么。我是否必须以某种方式启用设备的NEON设备和SIMD功能？或者我不希望这些小矩阵能有更好的表现吗？

This doesn’t seem right to me, and I’m left with the feeling that I’m doing something wrong somewhere. Do I have to to enable the device’s NEON unit and SIMD functionality somehow? Or shouldn’t I hope for better performance with such small matrices?

非常感谢，

巴斯蒂安安

iOS 4使用4x4矩阵加速Cblas [英] iOS 4 Accelerate Cblas with 4x4 matrices

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录关闭

iOS 4使用4x4矩阵加速Cblas [英] iOS 4 Accelerate Cblas with 4x4 matrices

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录 关闭

登录关闭