在C / C简单而快速的矩阵向量乘法++ [英] Simple and fast matrix-vector multiplication in C / C++

查看:298
本文介绍了在C / C简单而快速的矩阵向量乘法++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要的频繁使用matrix_vector_mult()其中矢量矩阵相乘,以下是其执行情况。

问:有没有一种简单的方法,使之显著,至少两次,更快

备注:1)基体的大小是约300x50的。它在不改变
跑。 2)必须在Windows和Linux的工作。

 双vectors_dot_prod(常量双* X,常量双* Y,INT N)
{
    双解析度= 0.0;
    INT I;
    对于(i = 0; I< N;我++)
    {
        RES + = X [I] * Y [I]
    }
    返回水库;
}无效matrix_vector_mult(常量双**垫,常量双* VEC,双*结果,诠释行,诠释COLS)
{//矩阵形式:结果=垫* VEC;
    INT I;
    对于(i = 0; I<行;我++)
    {
        结果[I] = vectors_dot_prod(垫[I],VEC,COLS);
    }
}


解决方案

这是什么,在理论上一个很好的编译器应该自行完成的,但是我做了一个尝试用我的系统(G ++ 4.6.3),并有大约两倍速度上的尺寸为300x50矩阵由专人展开4次乘法(约每矩阵,而不是每个矩阵34us 18us):

 双vectors_dot_prod2(常量双* X,常量双* Y,INT N)
{
    双解析度= 0.0;
    INT I = 0;
    对于(; I< = N-4,I + = 4)
    {
        RES + =(X [I] * Y [I] +
                X [I + 1] *值Y [i + 1] +
                X [I + 2] *值Y [i + 2] +
                ×〔I + 3] * Y [i + 3中]);
    }
    对于(; I< N;我++)
    {
        RES + = X [I] * Y [I]
    }
    返回水库;
}

我期望然而这个级别的微优化的结果,以系统之间变化很大。

I need frequent usage of matrix_vector_mult() which multiplies matrix with vector, and below is its implementation.

Question: Is there a simple way to make it significantly, at least twice, faster?

Remarks: 1) The size of the matrix is about 300x50. It doesn't change during the run. 2) It must work on both Windows and Linux.

double vectors_dot_prod(const double *x, const double *y, int n)
{
    double res = 0.0;
    int i;
    for (i = 0; i < n; i++)
    {
        res += x[i] * y[i];
    }
    return res;
}

void matrix_vector_mult(const double **mat, const double *vec, double *result, int rows, int cols)
{ // in matrix form: result = mat * vec;
    int i;
    for (i = 0; i < rows; i++)
    {
        result[i] = vectors_dot_prod(mat[i], vec, cols);
    }
}

解决方案

This is something that in theory a good compiler should do by itself, however I made a try with my system (g++ 4.6.3) and got about twice the speed on a 300x50 matrix by hand unrolling 4 multiplications (about 18us per matrix instead of 34us per matrix):

double vectors_dot_prod2(const double *x, const double *y, int n)
{
    double res = 0.0;
    int i = 0;
    for (; i <= n-4; i+=4)
    {
        res += (x[i] * y[i] +
                x[i+1] * y[i+1] +
                x[i+2] * y[i+2] +
                x[i+3] * y[i+3]);
    }
    for (; i < n; i++)
    {
        res += x[i] * y[i];
    }
    return res;
}

I expect however the results of this level of micro-optimization to vary wildly between systems.

这篇关于在C / C简单而快速的矩阵向量乘法++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆