在C / C简单而快速的矩阵向量乘法++ [英] Simple and fast matrix-vector multiplication in C / C++
问题描述
我需要的频繁使用matrix_vector_mult()
其中矢量矩阵相乘,以下是其执行情况。
问:有没有一种简单的方法,使之显著,至少两次,更快
备注:1)基体的大小是约300x50的。它在不改变
跑。 2)必须在Windows和Linux的工作。
双vectors_dot_prod(常量双* X,常量双* Y,INT N)
{
双解析度= 0.0;
INT I;
对于(i = 0; I< N;我++)
{
RES + = X [I] * Y [I]
}
返回水库;
}无效matrix_vector_mult(常量双**垫,常量双* VEC,双*结果,诠释行,诠释COLS)
{//矩阵形式:结果=垫* VEC;
INT I;
对于(i = 0; I<行;我++)
{
结果[I] = vectors_dot_prod(垫[I],VEC,COLS);
}
}
这是什么,在理论上一个很好的编译器应该自行完成的,但是我做了一个尝试用我的系统(G ++ 4.6.3),并有大约两倍速度上的尺寸为300x50矩阵由专人展开4次乘法(约每矩阵,而不是每个矩阵34us 18us):
双vectors_dot_prod2(常量双* X,常量双* Y,INT N)
{
双解析度= 0.0;
INT I = 0;
对于(; I< = N-4,I + = 4)
{
RES + =(X [I] * Y [I] +
X [I + 1] *值Y [i + 1] +
X [I + 2] *值Y [i + 2] +
×〔I + 3] * Y [i + 3中]);
}
对于(; I< N;我++)
{
RES + = X [I] * Y [I]
}
返回水库;
}
我期望然而这个级别的微优化的结果,以系统之间变化很大。
I need frequent usage of matrix_vector_mult()
which multiplies matrix with vector, and below is its implementation.
Question: Is there a simple way to make it significantly, at least twice, faster?
Remarks: 1) The size of the matrix is about 300x50. It doesn't change during the run. 2) It must work on both Windows and Linux.
double vectors_dot_prod(const double *x, const double *y, int n)
{
double res = 0.0;
int i;
for (i = 0; i < n; i++)
{
res += x[i] * y[i];
}
return res;
}
void matrix_vector_mult(const double **mat, const double *vec, double *result, int rows, int cols)
{ // in matrix form: result = mat * vec;
int i;
for (i = 0; i < rows; i++)
{
result[i] = vectors_dot_prod(mat[i], vec, cols);
}
}
This is something that in theory a good compiler should do by itself, however I made a try with my system (g++ 4.6.3) and got about twice the speed on a 300x50 matrix by hand unrolling 4 multiplications (about 18us per matrix instead of 34us per matrix):
double vectors_dot_prod2(const double *x, const double *y, int n)
{
double res = 0.0;
int i = 0;
for (; i <= n-4; i+=4)
{
res += (x[i] * y[i] +
x[i+1] * y[i+1] +
x[i+2] * y[i+2] +
x[i+3] * y[i+3]);
}
for (; i < n; i++)
{
res += x[i] * y[i];
}
return res;
}
I expect however the results of this level of micro-optimization to vary wildly between systems.
这篇关于在C / C简单而快速的矩阵向量乘法++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!