使用SSE加速float 5x5矩阵*向量乘法 [英] Speed up float 5x5 matrix * vector multiplication with SSE
问题描述
我需要每秒进行240000次矩阵向量乘法.矩阵为5x5,并且始终相同,而向量在每次迭代时都会变化.数据类型为float
.我当时正在考虑使用一些SSE(或类似的)指令.
I need to run a matrix-vector multiplication 240000 times per second. The matrix is 5x5 and is always the same, whereas the vector changes at each iteration. The data type is float
. I was thinking of using some SSE (or similar) instructions.
-
我担心算术运算的数量与所涉及的存储器运算的数量相比太少.您认为我可以得到一些明显的改善(例如> 20%)吗?
I am concerned that the number of arithmetic operations is too small compared to the number of memory operations involved. Do you think I can get some tangible (e.g. > 20%) improvement?
我需要英特尔编译器吗?
Do I need the Intel compiler to do it?
您能指出一些参考吗?
推荐答案
本征 C ++模板库,用于矢量,矩阵,...两者都有
The Eigen C++ template library for vectors, matrices, ... has both
-
用于固定大小小的矩阵(以及动态大小的矩阵)的优化代码
optimised code for small fixed size matrices (as well as dynamically sized ones)
使用SSE优化的优化代码
optimised code that uses SSE optimisations
所以您应该尝试一下.
这篇关于使用SSE加速float 5x5矩阵*向量乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!