矩阵优化 [英] Matrix optimization

查看:70
本文介绍了矩阵优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定A [n] [m],B [n] [m]和C [n] [m]


我想计算每个条目的总和:


A [i] [j] = B [i] [j] + C [i] [j]每个我,j


最简单的方法是使用for循环,但我觉得性能不好。


有没有可能找到更快的方法呢?

谢谢。

Pat

Given A[n][m], B[n][m] and C[n][m]

I would like to calculate the sum of each entry:

A[i][j]=B[i][j]+C[i][j] for each i,j

The easiest way is to use for-loop, but I think the performance is not good.

Is it possible to find out some faster way to do that?
Thanks.
Pat

推荐答案

Pat写道:
给定A [ n] [m],B [n] [m]和C [n] [m]

我想计算每个条目的总和:

A [ i] [j] = B [i] [j] + C [i] [j]每个i,j

最简单的方法是使用for-loop,但我认为性能是不好。

是否有可能找到更快的方法来做到这一点?
谢谢。
Pat
Given A[n][m], B[n][m] and C[n][m]

I would like to calculate the sum of each entry:

A[i][j]=B[i][j]+C[i][j] for each i,j

The easiest way is to use for-loop, but I think the performance is not good.

Is it possible to find out some faster way to do that?
Thanks.
Pat



这在很大程度上取决于实施。一些编译器知道如何使用
vectorize对于循环,一些机器有严重的缓存注意事项,

一些机器有向量指令。


你的问题不是关于C ++本身,我建议你询问

直接与平台相关的讨论组你要求得到正确答案的



然而,鉴于一个愚蠢的编译器和一个愚蠢的架构,最快的这个

可能是这样的。


模板< typename T,int Rows,int Cols>

void Add(

T(& A)[Rows] [Cols],

const T(& B)[ [行] [Cols],

const T(& C)[行] [Cols]

){

T * const Ap = &安培; A [0] [0];

const T * const Bp =& B [0] [0];

const T * const Cp =& C [0] [0];


const int count =行* Cols;


for(int i = 0; i< count; ++ i)

{

Ap [i] = Bp [i] + Cp [i];

}


}


//如果编译器能够进行循环展开,这可能非常好

// zippy;


***注意 - 我没有语法检查它。



This depends so much on implementation. Some compilers know how to
"vectorize" for loops, some machines have serious cache considerations,
some machines have vector instructions.

Your question is not really about C++ per-se, I suggest you ask
discussion groups that are related directly to the platform you''re
asking about to get the right answer.

However, given a dumb compiler and a dumb architecture, the fastest this
is probably somthing like this.

template <typename T, int Rows, int Cols>
void Add(
T (&A)[Rows][Cols],
const T (&B)[Rows][Cols],
const T (&C)[Rows][Cols]
) {
T * const Ap = & A[0][0];
const T * const Bp = & B[0][0];
const T * const Cp = & C[0][0];

const int count = Rows * Cols;

for ( int i = 0; i < count; ++ i )
{
Ap[ i ] = Bp[ i ] + Cp[ i ];
}

}

// if the compiler is able to do loop unrolling, this can be quite
// zippy;

*** Note - I did not syntax check it.


Pat写道:
给定A [n] [m],B [n] [m]和C [n] [m]

我想计算每个条目的总和:
每个i的A [i] [j] = B [i] [j] + C [i] [j],j

最简单的方法是使用for-loop ,但我觉得性能不好。

是否有可能找到更快的方法呢?
Given A[n][m], B[n][m] and C[n][m]

I would like to calculate the sum of each entry:

A[i][j]=B[i][j]+C[i][j] for each i,j

The easiest way is to use for-loop, but I think the performance is not good.

Is it possible to find out some faster way to do that?




(a)为什么你认为性能不好?


(b)你可以避免索引,如果你定义指针并增加

循环中的那些:


for(int i = 0; i< M; + + i)

for(int j = 0; j< N; ++ j)

A [i] [j] = B [i] [j] + C [i] [j];


变为< (b i = 0; i< M; ++ i){

TYPE * pa = A [i],* pb = B [
] i],* pc = C [i];

for(int j = 0; j< N; ++ j)

* pa ++ = * pb ++ + * pc ++;

}


您仍然可以通过[i]消除索引,但我会留下

到你要弄清楚。


(c)认为过早优化是所有邪恶的根源。

[i] [j]形式虽然可能效率不高,但至少

传达的信息比* pa ++更清晰......


Victor



(a) Why do you "think the performance is not good"?

(b) You could avoid indexing if you define pointers and increment
those in the loop:

for (int i = 0; i < M; ++i)
for (int j = 0; j < N; ++j)
A[i][j] = B[i][j] + C[i][j];

becomes

for (int i = 0; i < M; ++i) {
TYPE *pa = A[i], *pb = B[i], *pc = C[i];
for (int j = 0; j < N; ++j)
*pa++ = *pb++ + *pc++;
}

You can still eliminate the indexing by [i], but I''ll leave it
to you to figure out.

(c) Consider that "premature optimization is the root of all evil".
The [i][j] form while may not be extremely efficient, at least
conveys the message clearer than *pa++ = ...

Victor


Gianni Mariani写道:
Gianni Mariani wrote:
Pat写道:
Pat wrote:
给定A [n] [m],B [n] [m]和C [n] [m]

我想计算每个条目的总和:

A [i] [j] = B [i] [j] + C [i] [j]为每个i,j

最简单的方法是使用for-loop,但我认为性能不是
是否有可能找到更快的方法来做到这一点?
谢谢。
Pat
Given A[n][m], B[n][m] and C[n][m]

I would like to calculate the sum of each entry:

A[i][j]=B[i][j]+C[i][j] for each i,j

The easiest way is to use for-loop, but I think the performance is not
good.

Is it possible to find out some faster way to do that?
Thanks.
Pat



这在很大程度上取决于实施。一些编译器知道如何矢量化。对于循环,一些机器有严重的缓存注意事项,有些机器有矢量指令。

你的问题不是关于C ++本身,我建议你问一下讨论组与你要求得到正确答案的平台直接相关。

然而,鉴于一个愚蠢的编译器和一个愚蠢的架构,这个可能是最快的
这样的事情。

模板< typename T,int Rows,int Cols>
void Add(
T(& A)[Rows] [Cols],
const T(& B)[行] [Cols],
const T(& C)[行] [Cols]
){* T * const Ap =& A [0] [0];
const T * const Bp =& B [0] [0];
const T * const Cp =& C [0] [0];

const int count =行* Cols;

for(int i = 0; i< count; ++ i)
{
Ap [i] = Bp [i] + Cp [i];
}

}
//如果编译器是能够进行循环展开,这可能相当
// zippy;



This depends so much on implementation. Some compilers know how to
"vectorize" for loops, some machines have serious cache considerations,
some machines have vector instructions.

Your question is not really about C++ per-se, I suggest you ask
discussion groups that are related directly to the platform you''re
asking about to get the right answer.

However, given a dumb compiler and a dumb architecture, the fastest this
is probably somthing like this.

template <typename T, int Rows, int Cols>
void Add(
T (&A)[Rows][Cols],
const T (&B)[Rows][Cols],
const T (&C)[Rows][Cols]
) {
T * const Ap = & A[0][0];
const T * const Bp = & B[0][0];
const T * const Cp = & C[0][0];

const int count = Rows * Cols;

for ( int i = 0; i < count; ++ i )
{
Ap[ i ] = Bp[ i ] + Cp[ i ];
}

}

// if the compiler is able to do loop unrolling, this can be quite
// zippy;




很好。如果我做的话你会说什么


T * Ap =& A [0] [0];

const T * Bp ...

...

for(int i = 0; i< count; ++ i)

* Ap ++ = * Bp ++ + * Cp ++;


?它可能只是一点点......


V



Nice. What would you say if I did

T * Ap = &A[0][0];
const T * Bp ...
...
for (int i = 0; i < count; ++i)
*Ap++ = *Bp++ + *Cp++;

? It might be just a tad faster...

V


这篇关于矩阵优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆