C ++自动向量化矩阵乘法循环 [英] C++ Auto-Vectorize Matrix Multiplication loop
问题描述
在编译源代码并执行基本矩阵与矩阵的乘法并启用自动矢量化和自动并行化功能时,我在控制台中收到以下警告:
When compiling my source code which does basic matrix-matrix multiplication with auto-vectorization and auto-parallelization enabled, I receive these warnings in console:
C5002: loop not vectorized due to reason '1200'
C5012: loop not parallelized due to reason'1000'
我已经阅读了MSDN提供的此资源哪个状态:
I've read through this resource provided by MSDN which states:
原因码1200:循环包含循环承载的数据相关性,从而阻止了向量化.循环的不同迭代会相互干扰,因此对循环进行矢量化将产生错误的答案,并且自动矢量化器无法自行证明不存在此类数据依赖性.
Reason code 1200: Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences.
原因代码1000:编译器在循环体内检测到数据依赖性.
Reason code 1000: The compiler detected a data dependency in the loop body.
我不确定我的循环中是什么引起了问题.这是我的源代码的相关部分.
I'm not sure what in my loop is causing problems. Here is the relevant portion of my source code.
// int** A, int** B, int** result, const int dimension
for (int i = 0; i < dimension; ++i) {
for (int j = 0; j < dimension; ++j) {
for (int k = 0; k < dimension; ++k) {
result[i][j] = result[i][j] + A[i][k] * B[k][j];
}
}
}
任何见识将不胜感激.
推荐答案
循环执行的依赖项取决于result[i][j]
.
The loop carried dependence is on result[i][j]
.
您的问题的一种解决方案是在对结果求和并在最内层循环之外进行更新时使用一个临时变量,如下所示:
A solution to your problem would be using a temporary variable when summing up the result and do the update outside the inner-most loop like this:
for (int i = 0; i < dimension; ++i) {
for (int j = 0; j < dimension; ++j) {
auto tmp = 0;
for (int k = 0; k < dimension; ++k) {
tmp += A[i][k] * B[k][j];
}
result[i][j] = tmp;
}
}
这将消除依赖关系(因为对result[i][j]
的读写操作更多,并且应该有助于矢量化器做得更好.
This is going remove the dependence (since there is more read-after-write of result[i][j]
and should help the vectorizer doing a better job.
这篇关于C ++自动向量化矩阵乘法循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!