c ++速度比较迭代器与索引 [英] c++ speed comparison iterator vs index

查看:111
本文介绍了c ++速度比较迭代器与索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在用c ++编写一个linalg库,用于教育目的和个人用途。作为其中的一部分,我实现了一个自定义矩阵类,其中包含自定义行和列迭代器。在为std :: algorithm和std :: numeric函数提供非常好的功能的同时,我对index和iterator / std :: inner_product方法之间的矩阵乘法进行了速度比较。结果显着不同:

I am currently writing a linalg library in c++, for educational purposes and personal use. As part of it I implemented a custom matrix class with custom row and column iterators. While providing very nice feature to work with std::algorithm and std::numeric functions, I performed a speed comparison for a matrix multiplication between an index and iterator/std::inner_product approach. The results differ significantly:

// used later on for the custom iterator
template<class U>
struct EveryNth {
    bool operator()(const U& ) { return m_count++ % N == 0; }
    EveryNth(std::size_t i) : m_count(0), N(i) {}
    EveryNth(const EveryNth& element) : m_count(0), N(element.N) {}

private:
    int m_count;
    std::size_t N;
};

template<class T, 
         std::size_t rowsize, 
         std::size_t colsize>  
class Matrix
{

private:

    // Data is stored in a MVector, a modified std::vector
    MVector<T> matrix;

    std::size_t row_dim;                  
    std::size_t column_dim;

public:

    // other constructors, this one is for matrix in the computation
    explicit Matrix(MVector<T>&& s): matrix(s), 
                                     row_dim(rowsize), 
                                     column_dim(colsize){
    }    

    // other code...

    typedef boost::filter_iterator<EveryNth<T>, 
                                   typename std::vector<T>::iterator> FilterIter;

    // returns an iterator that skips elements in a range
    // if "to" is to be specified, then from has to be set to a value
    // @ param "j" - j'th column to be requested
    // @ param "from" - starts at the from'th element
    // @ param "to" - goes from the from'th element to the "to'th" element
    FilterIter  begin_col( std::size_t j,
                           std::size_t from = 0, 
                           std::size_t to = rowsize ){
        return boost::make_filter_iterator<EveryNth<T> >(
            EveryNth<T>( cols() ), 
            matrix.Begin() + index( from, j ), 
            matrix.Begin() + index( to, j )
            );
    }

    // specifies then end of the iterator
    // so that the iterator can not "jump" past the last element into undefines behaviour
    FilterIter end_col( std::size_t j, 
                        std::size_t to = rowsize ){
        return boost::make_filter_iterator<EveryNth<T> >(
            EveryNth<T>( cols() ), 
            matrix.Begin() + index( to, j ), 
            matrix.Begin() + index( to, j )
            );
    }

    FilterIter  begin_row( std::size_t i,
                           std::size_t from = 0,
                           std::size_t to = colsize ){
         return boost::make_filter_iterator<EveryNth<T> >(
            EveryNth<T>( 1 ), 
            matrix.Begin() + index( i, from ), 
            matrix.Begin() + index( i, to )
            );
    }

    FilterIter  end_row( std::size_t i,
                         std::size_t to = colsize ){
        return boost::make_filter_iterator<EveryNth<T> >(
            EveryNth<T>( 1 ), 
            matrix.Begin() + index( i, to ), 
            matrix.Begin() + index( i, to )
            );
    }

    // other code...

    // allows to access an element of the matrix by index expressed
    // in terms of rows and columns
    // @ param "r" - r'th row of the matrix
    // @ param "c" - c'th column of the matrix
    std::size_t index(std::size_t r, std::size_t c) const {
        return r*cols()+c; 
    }

    // brackets operator
    // return an elements stored in the matrix
    // @ param "r" - r'th row in the matrix
    // @ param "c" - c'th column in the matrix
    T& operator()(std::size_t r, std::size_t c) { 
        assert(r < rows() && c < matrix.size() / rows());
        return matrix[index(r,c)]; 
    }

    const T& operator()(std::size_t r, std::size_t c) const {
        assert(r < rows() && c < matrix.size() / rows()); 
        return matrix[index(r,c)]; 
    }

    // other code...

    // end of class
};

现在在主函数中运行以下内容:

Now in the main function in run the following:

int main(int argc, char *argv[]){


    Matrix<int, 100, 100> a = Matrix<int, 100, 100>(range<int>(10000));


    std::clock_t begin = clock();
    double b = 0;
    for(std::size_t i = 0; i < a.rows(); i++){
        for (std::size_t j = 0; j < a.cols(); j++) {
                std::inner_product(a.begin_row(i), a.end_row(i), 
                                   a.begin_column(j),0);        
        }
    }

    // double b = 0;
    // for(std::size_t i = 0; i < a.rows(); i++){
    //     for (std::size_t j = 0; j < a.cols(); j++) {
    //         for (std::size_t k = 0; k < a.rows(); k++) {
    //             b += a(i,k)*a(k,j);
    //         }
    //     }
    // }


    std::clock_t end = clock();
    double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
    std::cout << elapsed_secs << std::endl;

    std::cout << "--- End of test ---" << std::endl;

    std::cout << std::endl;
    return 0;
}

对于std :: inner_product / iterator方法,需要:

For the std::inner_product/iterator approach it takes:

bash-3.2$ ./main

3.78358
--- End of test ---

和索引(// out)方法:

and for the index (// out) approach:

bash-3.2$ ./main

0.106173
--- End of test ---

这几乎是迭代器方法的40倍。你在代码中看到任何可以减慢迭代器计算的东西吗?我应该提一下,我尝试了两种方法并产生了正确的结果。

which is almost 40 times faster then the iterator approach. Do you see anything in the code that could slow down iterator computation that much? I should mention that I tried out both methods and they produce correct results.

感谢您的想法。

推荐答案

您必须了解的是矩阵运算非常清楚,并且编译器非常擅长对矩阵运算中涉及的事物进行优化。

What you have to understand is that matrix operations are VERY well-understood, and compilers are VERY good at doing optimizations of the things that are involved in matrix operations.

考虑C = AB,其中C是MxN,A是MxQ,B是QxN。

Consider C = AB, where C is MxN, A is MxQ, B is QxN.

double a[M][Q], b[Q][N], c[M][N];
for(unsigned i = 0; i < M; i++){
  for (unsigned j = 0; j < N; j++) {
    double temp = 0.0;
    for (unsigned k = 0; k < Q; k++) {
      temp += a[i][k]*b[k][j];
    }
    c[i][j] = temp;
  }
}

(你不会相信我写的多么诱惑FORTRAN IV中的上述内容。)

(You would not believe how tempted I was to write the above in FORTRAN IV.)

编译器对此进行了研究,并注意到实际发生的事情是他正以1和1的步幅走过a和c b,步长为Q.他消除了下标计算中的乘法,并进行直接索引。

The compiler looks at this, and notices that what is really happening is that he is walking through a and c with a stride of 1 and b with a stride of Q. He eliminates the multiplications in the subscript calculations and does straight indexing.

此时,内循环的格式为:

At that point, the inner loop is of the form:

temp += a[r1] * b[r2];
r1 += 1;
r2 += Q;

你周围有循环来(重新)初始化每次传递的r1和r2。

And you have loops around that to (re)initialize r1 and r2 for each pass.

这是您可以进行直接矩阵乘法的绝对最小计算。你不能做到这一点,因为你必须做那些乘法和加法以及索引调整。

That is the absolute minimum computation you can do to do a straightforward matrix multiplication. You cannot do any less than that, because you have to do those multiplications and additions and index adjustments.

你所能做的就是增加开销。

All you can do is add overhead.

这就是迭代器和std :: inner_product()方法的作用:它增加了公吨的开销。

That's what the iterator and std::inner_product() approach does: it adds metric tonnes of overhead.

这篇关于c ++速度比较迭代器与索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆