加速矩阵乘法SSE(C ++) [英] Speed up matrix multiplication by SSE (C++)
问题描述
我需要运行一个矩阵向量乘法每秒240000次。矩阵是5x5并且总是相同的,而向量在每次迭代时改变。数据类型为float。我正在考虑使用一些SSE(或类似的)指令。
I need to run a matrix-vector multiplication 240000 times per second. The matrix is 5x5 and is always the same, whereas the vector changes at each iteration. The data type is float. I was thinking of using some SSE (or similar) instructions.
1)我担心算术运算的数量太少与所涉及的内存操作的数量。你认为我可以得到一些有形的(例如> 20%)的改进吗?
1) I am concerned that the number of arithmetic operations is too small compared to the number of memory operations involved. Do you think I can get some tangible (e.g. > 20%) improvement?
2)我需要英特尔编译器吗?
2) Do I need the Intel compiler to do it?
3)您能指出一些参考资料吗?
3) Can you point out some references?
感谢大家!
推荐答案
Eigen C ++模板库用于向量,矩阵...对于小型固定大小矩阵(以及动态大小的矩阵)都具有
The Eigen C++ template library for vectors, matrices, ... has both
-
优化代码
optimised code for small fixed size matrices (as well as dynamically sized ones)
使用SSE优化的优化代码
optimised code that uses SSE optimisations
。
这篇关于加速矩阵乘法SSE(C ++)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!