性能问题：C和C ++中的矩阵乘法 [英] Performance issue: matrix multiplication in C and C++

查看：95 发布时间：2019/6/6 14:02:30 c

本文介绍了性能问题：C和C ++中的矩阵乘法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究矩阵乘法代码（矩阵时间

矩阵），并且出现了一些有趣/混乱结果

关于（显然）相同算法的运行时间，当用C或C ++实现时，
。我注意到，与相同的（？）

C ++ - 实现相比，C中的实现更快了2.5倍。

C中的基本算法是：

函数main中的
/ * * /

idxC = 0;

for（i = 0; i< dimension; i ++）

for（k = 0; k< dimension; k ++）{

idxA = i * dimension;

endA = idxA + dimension;

idxB = k;

while（idxA< endA）{

Celem [ idxC] + = Aelem [idxA] * Belem [idxB];

idxA ++;

idxB + =维度;

};

idxC ++;

};

其中矩阵A，B和C使用1维实现

数组和逐行编号（即Fortran风格）。

例如，idxA指向矩阵A的当前元素A（i，j）。

在idxA ++之后，指针将指向元素A（i，j + 1）。

类似，idxA + = dimension将指针移动到元素A（i + 1，j）。 br />
Th我们，代码计算矩阵乘积C = A * B，同时尝试
避免显式计算矩阵索引。

C ++中的相应算法是：

模板< class MatElem>

void FortranMatrix< MatElem> :: mult（const FortranMatrix< MatElem>& B，

FortranMatrix< MatElem>& C）const {

/ * ... * /

int idxC = 0;

for（int i = 0; i< dimension ; i ++）

for（int k = 0; k< dimension; k ++）{

int idxA = i * dimension;

int endA = idxA +维度;

int idxB = k;

while（idxA< endA）{

C.elem [idxC] + = elem [idxA] * B.elem [idxB];

idxA ++;

idxB + =维度;

};

idxC ++;

};

}

这里，A，B和C是某些矩阵的对象具有

成员元素的类（类型为double *）。

对于相同大小的矩阵（729x729），C实现需要

在我的机器上大约10秒钟（Pentium III，gcc，版本3.2，

优化级别-O3和-funroll-loops）。

C ++ - 使用g ++（相同版本，相同的优化参数）在相同的

机器上实现大约需要25秒。

是否有人能够给出一个co关于

造成这种巨大性能损失的原因？

使用静态函数，并将双数组作为参数，即

模板< class MatElem>

void FortranMatrix< MatElem> :: stamult（const MatElem * A，

const MatElem * B，

MatElem * C，int dimension）{

/ * ... * /

int idxC = 0;

for （int i = 0; I<尺寸; i ++）

for（int k = 0; k< dimension; k ++）{

int idxA = i * dimension;

int endA = idxA +维度;

int idxB = k;

while（idxA< endA）{

C [idxC] + = A [idxA ] * B [idxB];

idxA ++;

idxB + =维度;

};

idxC ++;

};

}

现在，这几乎与C算法相同，但代码仍然是

大约需要20秒。我实际上准备在使用C ++而不是C时看到某些

的性能损失。但是，我发现它让你感到困惑，因为你在代码中丢失了因子2甚至没有接近

使用可能会降低代码速度的反对导向功能。

我没有使用模板尝试它，但这不能使太好了

差别很大，可以吗？

我会对此表示感谢，我要感谢你

非常花时间阅读这篇相当长篇文章:-)

Michael

Hi,

I''m currently working on a matrix multiplication code (matrix times
matrix), and have come along some interesting/confusing results
concerning the running time of the (apparently) same algorithm,
when implemented in C or C++. I noticed that the implementation
in C is faster by a factor of 2.5 compared to a identical(?)
C++-implementation.

The basic algorithm in C is:

/* in function main() */
idxC = 0;
for(i=0; i< dimension; i++)
for(k=0; k< dimension; k++) {
idxA = i*dimension;
endA = idxA + dimension;
idxB = k;
while ( idxA < endA ) {
Celem[idxC] += Aelem[idxA] * Belem[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};

where matrices A, B, and C are implemented using a 1-dimensional
array and line-by-line numbering (i.e. Fortran-style).
For example, idxA points to the current element A(i,j) of matrix A.
After idxA++, the pointer will point to element A(i,j+1).
Similar, idxA += dimension will move the pointer to element A(i+1,j).
Thus, the code computes the matrix product C=A*B, while trying to
avoid explicit computation of matrix indices.

The respective algorithm in C++ is:

template<class MatElem>
void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
FortranMatrix<MatElem>& C) const {
/* ... */
int idxC = 0;
for(int i=0; i< dimension; i++)
for(int k=0; k< dimension; k++) {
int idxA = i*dimension;
int endA = idxA + dimension;
int idxB = k;
while ( idxA < endA ) {
C.elem[idxC] += elem[idxA] * B.elem[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};
}

Here, A, B, and C, are objects of some matrix class that have a
member elem (of type double*).

For the same size of matrices (729x729), the C implementation takes
around 10 seconds on my machine (Pentium III, gcc, version 3.2,
optimization level -O3 and -funroll-loops).
The C++-implementation takes approximately 25 seconds on the same
machine using g++ (same version, same optimization parameters).

Is anybody out there able to give a comment on the reasons for
this huge loss of performance?

Using a static function, and the double-arrays as parameters, i.e.

template<class MatElem>
void FortranMatrix<MatElem>::stamult(const MatElem* A,
const MatElem* B,
MatElem* C, int dimension) {
/* ... */
int idxC = 0;
for(int i=0; i< dimension; i++)
for(int k=0; k< dimension; k++) {
int idxA = i*dimension;
int endA = idxA + dimension;
int idxB = k;
while ( idxA < endA ) {
C[idxC] += A[idxA] * B[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};
}

Now, this is almost identical to the C algorithm, but the code still
takes around 20 seconds. I was actually prepared to see certain
performance losses when using C++ instead of C. However, I find it
confusing that you lose a factor 2 in a code that''s not even close to
using objected oriented features that may slow the code down.
I didn''t try it without using templates, but that can''t make too
much of a difference, can it?

I''d appreciate any comment on this, and I would like to thank you
very much for taking the time to read this rather longish article :-)

Michael

推荐答案

Michael Bader写道：

....

Michael Bader wrote:
....

C ++中的相应算法是：

template< class MatElem>
void FortranMatrix< MatElem> :: mult（const FortranMatrix< MatElem>& B，
FortranMatrix< MatElem>& C）const {
/ * ... * /
int idxC = 0;
for（int i = 0; I<尺寸; i ++）
for（int k = 0; k< dimension; k ++）{
int idxA = i * dimension;
int endA = idxA + dimension;
int idxB = k ;
while（idxA< endA）{
C.elem [idxC] + = elem [idxA] * B.elem [idxB];

C.elem [idxC]

您可以将其从循环中移除。

编译器可能无法优化elem的获取。数组，所以

你需要自己动手。

尝试类似的事情：

模板< ;类MatElem>

void FortranMatrix< MatElem> :: mult（const FortranMatrix< MatElem>& B，

FortranMatrix< MatElem>& C）const {

/ * ... * /

int idxC = 0;

for（int i = 0; i< dimension; i ++）

for（int k = 0; k< dimension; k ++）{

int idxA = i * dimension;

int endA = idxA + dimension;

int idxB = k;

MatElem v = 0;

while（idxA< endA）{

v + = elem [idxA] * B.elem [idxB];

idxA ++;

idxB + =维度;

};

C.elem [idxC] = v;

idxC ++;

};

}

- 仍然，您还可以获取地址this-> elem，B.elem和C.elem

并将它们放入临时值（尽管编译器可能会这样做

for this-> elem和B.elem sinc e * this和B是const但不是C.）。

idxA ++;
idxB + = dimension;
};
idxC ++;
};
}
这里，A，B和C是某些矩阵类的对象，它们具有
成员元素elem（类型为double *）。对于相同大小的矩阵（729x729），C实现在我的机器上大约需要10秒钟（Pentium III，gcc，版本3.2，
优化级别-O3和 - funroll-loops）。使用g ++（相同版本，相同的优化参数）在同一台机器上实现C ++ - 大约需要25秒。

任何人都可以在那里对这种巨大的性能损失的原因发表评论？

使用静态函数，并将双数组作为参数，即

模板<类MatElem>
void FortranMatrix< MatElem> :: stamult（const MatElem * A，
const MatElem * B，
MatElem * C，int dimension）{
/ * ... * /
int idxC = 0;
for（int i = 0; I<尺寸; i ++）
for（int k = 0; k< dimension; k ++）{
int idxA = i * dimension;
int endA = idxA + dimension;
int idxB = k ;
while（idxA< endA）{
C [idxC] + = A [idxA] * B [idxB];
idxA ++;
idxB + = dimension;
};
idxC ++;
};
}
现在，这几乎与C算法相同，但代码仍然需要处理20秒我实际上已经准备好在使用C ++而不是C时看到某些性能损失。但是，我发现它让你感到困惑的是你在一个甚至不接近
使用可能会降低代码速度的面向对象的功能。
我没有使用模板就尝试过，但是这不能太多了，不是很重要，可以吗？

您使用的是哪个版本的gcc？

我将不胜感激任何评论，我要感谢你
非常花时间阅读这篇相当冗长的文章： - ）

The respective algorithm in C++ is:

template<class MatElem>
void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
FortranMatrix<MatElem>& C) const {
/* ... */
int idxC = 0;
for(int i=0; i< dimension; i++)
for(int k=0; k< dimension; k++) {
int idxA = i*dimension;
int endA = idxA + dimension;
int idxB = k;
while ( idxA < endA ) {
C.elem[idxC] += elem[idxA] * B.elem[idxB];
C.elem[idxC]

You can probably remove this from the loop.

The compiler probably can''t optimize the fetch of the "elem" array, so
you''ll need to do it yourself.

Try somthing like:

template<class MatElem>
void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
FortranMatrix<MatElem>& C) const {
/* ... */
int idxC = 0;
for(int i=0; i< dimension; i++)
for(int k=0; k< dimension; k++) {
int idxA = i*dimension;
int endA = idxA + dimension;
int idxB = k;
MatElem v = 0;
while ( idxA < endA ) {
v += elem[idxA] * B.elem[idxB];
idxA++;
idxB += dimension;
};
C.elem[idxC] = v;
idxC++;
};
}

-- still, you can also fetch the addresses this->elem, B.elem and C.elem
and place them in temporaries (although, the compiler can probably do it
for this->elem and B.elem since *this and B are const but not for C.).

idxA++;
idxB += dimension;
};
idxC++;
};
}

Here, A, B, and C, are objects of some matrix class that have a
member elem (of type double*).

For the same size of matrices (729x729), the C implementation takes
around 10 seconds on my machine (Pentium III, gcc, version 3.2,
optimization level -O3 and -funroll-loops).
The C++-implementation takes approximately 25 seconds on the same
machine using g++ (same version, same optimization parameters).

Is anybody out there able to give a comment on the reasons for
this huge loss of performance?

Using a static function, and the double-arrays as parameters, i.e.

template<class MatElem>
void FortranMatrix<MatElem>::stamult(const MatElem* A,
const MatElem* B,
MatElem* C, int dimension) {
/* ... */
int idxC = 0;
for(int i=0; i< dimension; i++)
for(int k=0; k< dimension; k++) {
int idxA = i*dimension;
int endA = idxA + dimension;
int idxB = k;
while ( idxA < endA ) {
C[idxC] += A[idxA] * B[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};
}

Now, this is almost identical to the C algorithm, but the code still
takes around 20 seconds. I was actually prepared to see certain
performance losses when using C++ instead of C. However, I find it
confusing that you lose a factor 2 in a code that''s not even close to
using objected oriented features that may slow the code down.
I didn''t try it without using templates, but that can''t make too
much of a difference, can it?
Which version of gcc are you using ?

I''d appreciate any comment on this, and I would like to thank you
very much for taking the time to read this rather longish article :-)

尝试发布可编辑的代码。最后一个例子确实看起来像

应该至少和C代码一样快。在没有检查实际生成的汇编代码的情况下，很难知道为什么它正在做它正在做的事情。

G

Try posting compilable code. The last example really does look like it
should be at least as fast as the C code. Without examining the actual
generated assembly code, it''s hard to know why it''s doing what it''s doing.

G

" Michael Bader" < BA *** @ in.tum.de>写了

"Michael Bader" <ba***@in.tum.de> wrote

我正在研究矩阵乘法代码（矩阵时间矩阵），并且出现了一些有趣/混乱的结果<关于（显然）相同算法的运行时间，用C或C ++实现时。我注意到C中的实现比相同的（？）C ++实现快了2.5倍。

C中的基本算法是：函数main（）* /
/ * idxC = 0;
for（i = 0; i< dimension; i ++）
for（k = 0; k< ; dimension; k ++）{
idxA = i * dimension;
endA = idxA + dimension;
idxB = k;
while（idxA< endA）{
Celem [idxC] + = Aelem [idxA] * Belem [idxB];
idxA ++;
idxB + = dimension;
};
idxC ++;
};

矩阵A，B和C使用1维
数组和逐行编号（即Fortran风格）实现。
例如，idxA点矩阵A的当前元素A（i，j）。
在idxA ++之后，指针将指向元素A（i，j + 1）。
类似，idxA + =维度将移动指针元素A（i + 1，j）。
因此，co de计算矩阵乘积C = A * B，同时试图避免显式计算矩阵索引。

C ++中的相应算法是：

模板< class MatElem>
void FortranMatrix< MatElem> :: mult（const FortranMatrix< MatElem>& B，
FortranMatrix< MatElem>& C）const {
/ * ... * /
int idxC = 0;
for（int i = 0; i< dimension; i ++）
for（int k = 0; k< dimension; k ++）{
int idxA = i * dimension;
int endA = idxA + dimension;
int idxB = k;
while（idxA< endA ）{
C.elem [idxC] + = elem [idxA] * B.elem [idxB];
idxA ++;
idxB + = dimension;
};
idxC ++;
};
}
这里，A，B和C是某些矩阵类的对象，它们具有
成员元素（类型为double） *）。

对于相同大小的矩阵（729x729），我的机器上的C实现大约需要10秒钟（Pentium III，gcc，版本3.2，
优化级别） -O3和-funroll-loops）。使用g ++（相同版本，相同的优化参数）在同一台机器上实现C ++ - 大约需要25秒。

是否有人在那里能够发表评论o n这种巨大的性能损失的原因是什么？

使用静态函数，并将双数组作为参数，即

模板<类MatElem>
void FortranMatrix< MatElem> :: stamult（const MatElem * A，
const MatElem * B，
MatElem * C，int dimension）{
/ * ... * /
int idxC = 0;
for（int i = 0; I<尺寸; i ++）
for（int k = 0; k< dimension; k ++）{
int idxA = i * dimension;
int endA = idxA + dimension;
int idxB = k ;
while（idxA< endA）{
C [idxC] + = A [idxA] * B [idxB];
idxA ++;
idxB + = dimension;
};
idxC ++;
};
}
现在，这几乎与C算法相同，但代码仍然需要处理20秒我实际上已经准备好在使用C ++而不是C时看到某些性能损失。但是，我发现它让你感到困惑的是你在一个甚至不接近
使用可能会降低代码速度的面向对象的功能。
我没有使用模板尝试它，但是这不能太多了/不是很大的区别，可以吗？
我很感激对此发表任何评论，我非常感谢你花时间阅读这篇相当冗长的文章： - ）

Hi,

I''m currently working on a matrix multiplication code (matrix times
matrix), and have come along some interesting/confusing results
concerning the running time of the (apparently) same algorithm,
when implemented in C or C++. I noticed that the implementation
in C is faster by a factor of 2.5 compared to a identical(?)
C++-implementation.

The basic algorithm in C is:

/* in function main() */
idxC = 0;
for(i=0; i< dimension; i++)
for(k=0; k< dimension; k++) {
idxA = i*dimension;
endA = idxA + dimension;
idxB = k;
while ( idxA < endA ) {
Celem[idxC] += Aelem[idxA] * Belem[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};

where matrices A, B, and C are implemented using a 1-dimensional
array and line-by-line numbering (i.e. Fortran-style).
For example, idxA points to the current element A(i,j) of matrix A.
After idxA++, the pointer will point to element A(i,j+1).
Similar, idxA += dimension will move the pointer to element A(i+1,j).
Thus, the code computes the matrix product C=A*B, while trying to
avoid explicit computation of matrix indices.

The respective algorithm in C++ is:

template<class MatElem>
void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
FortranMatrix<MatElem>& C) const {
/* ... */
int idxC = 0;
for(int i=0; i< dimension; i++)
for(int k=0; k< dimension; k++) {
int idxA = i*dimension;
int endA = idxA + dimension;
int idxB = k;
while ( idxA < endA ) {
C.elem[idxC] += elem[idxA] * B.elem[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};
}

Here, A, B, and C, are objects of some matrix class that have a
member elem (of type double*).

For the same size of matrices (729x729), the C implementation takes
around 10 seconds on my machine (Pentium III, gcc, version 3.2,
optimization level -O3 and -funroll-loops).
The C++-implementation takes approximately 25 seconds on the same
machine using g++ (same version, same optimization parameters).

Is anybody out there able to give a comment on the reasons for
this huge loss of performance?

Using a static function, and the double-arrays as parameters, i.e.

template<class MatElem>
void FortranMatrix<MatElem>::stamult(const MatElem* A,
const MatElem* B,
MatElem* C, int dimension) {
/* ... */
int idxC = 0;
for(int i=0; i< dimension; i++)
for(int k=0; k< dimension; k++) {
int idxA = i*dimension;
int endA = idxA + dimension;
int idxB = k;
while ( idxA < endA ) {
C[idxC] += A[idxA] * B[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};
}

Now, this is almost identical to the C algorithm, but the code still
takes around 20 seconds. I was actually prepared to see certain
performance losses when using C++ instead of C. However, I find it
confusing that you lose a factor 2 in a code that''s not even close to
using objected oriented features that may slow the code down.
I didn''t try it without using templates, but that can''t make too
much of a difference, can it?

I''d appreciate any comment on this, and I would like to thank you
very much for taking the time to read this rather longish article :-)

最可能的原因是g ++没有生成像gcc那样好的中间代码（实际上，g ++ 3.x生成更大更慢的代码）比g ++ 2.95）。

这很常见。我见过的最糟糕的差异是在90年代早期Watcom的C和

C ++编译器，其中一些代码运行速度慢了10倍或更多

用他们的C ++编译器编译而不是用他们的C编译器编译。这不是C和C ++语言之间的差异，而是两个编译器之间的差异。
实现。 C ++编译器没有C编译器那么长的历史，

和C ++编译器编写者不得不花费添加的时间和支持无数功能的
C没有，C编译器编写者已经能够投入更好的代码生成。 C ++最终将迎头赶上
（今天的C ++编译器与10年前的C编译器相比）。在

的同时，您当然可以在C中编写性能关键代码并将其链接到您的C ++应用程序中。

Claudio Puviani

The most likely reason is that g++ doesn''t generate intermediate code that''s as
good as gcc (in fact, g++ 3.x generates bigger and slower code than g++ 2.95).
This is fairly common. The worst difference I''ve seen was with Watcom''s C and
C++ compilers in the early 90''s where some code ran 10 times or more slower when
compiled with their C++ compiler than with their C compiler. This isn''t a
difference between the C and C++ languages, but between the two compiler
implementations. C++ compilers just don''t have as long a history as C compilers,
and in the time that C++ compiler writers have had to spend adding and
supporting the myriad features that C doesn''t have, the C compiler writers have
been able to invest in better code generation. C++ will eventually catch up
(today''s C++ compilers compare favorably with C compilers 10 years ago). In the
meantime, you can certainly write performance-critical code in C and link it
into your C++ application.

Claudio Puviani

Michael Bader< ba *** @ in.tum.de>写在

新闻：c2 ********** @ wsc10.lrz-muenchen.de:

Michael Bader <ba***@in.tum.de> wrote in
news:c2**********@wsc10.lrz-muenchen.de:

我正在研究矩阵乘法代码（矩阵时间矩阵），并且出现了一些有趣/混乱的结果
关于（显然）相同算法的运行时间，
用C或C ++实现时。我注意到C中的实现比相同的（？）C ++实现快了2.5倍。

C中的基本算法是：函数main（）* /
/ * idxC = 0;
for（i = 0; i< dimension; i ++）
for（k = 0; k< ; dimension; k ++）{
idxA = i * dimension;
endA = idxA + dimension;
idxB = k;
while（idxA< endA）{
Celem [idxC] + = Aelem [idxA] * Belem [idxB];
idxA ++;
idxB + = dimension;
};
idxC ++;
};

矩阵A，B和C使用1维
数组和逐行编号（即Fortran风格）实现。
例如，idxA点矩阵A的当前元素A（i，j）。
在idxA ++之后，指针将指向元素A（i，j + 1）。
类似，idxA + =维度将移动指针元素A（i + 1，j）。
因此，co de计算矩阵乘积C = A * B，同时试图避免显式计算矩阵索引。

C ++中的相应算法是：

模板< class MatElem>
void FortranMatrix< MatElem> :: mult（const FortranMatrix< MatElem>& B，
FortranMatrix< MatElem>& C）
const {
/ * ... * /
int idxC = 0;
for（int i = 0; i< dimension; i ++）
for （int k = 0; k< dimension; k ++）{
int idxA = i * dimension;
int endA = idxA + dimension;
int idxB = k;
while（ idxA< endA）{
C.elem [idxC] + = elem [idxA] * B.elem [idxB];
idxA ++;
idxB + = dimension;
} ;
idxC ++;
};
}
这里，A，B和C是某些矩阵类的对象，它们具有
成员元素（类型为double *）。

Hi,

I''m currently working on a matrix multiplication code (matrix times
matrix), and have come along some interesting/confusing results
concerning the running time of the (apparently) same algorithm,
when implemented in C or C++. I noticed that the implementation
in C is faster by a factor of 2.5 compared to a identical(?)
C++-implementation.

The basic algorithm in C is:

/* in function main() */
idxC = 0;
for(i=0; i< dimension; i++)
for(k=0; k< dimension; k++) {
idxA = i*dimension;
endA = idxA + dimension;
idxB = k;
while ( idxA < endA ) {
Celem[idxC] += Aelem[idxA] * Belem[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};

where matrices A, B, and C are implemented using a 1-dimensional
array and line-by-line numbering (i.e. Fortran-style).
For example, idxA points to the current element A(i,j) of matrix A.
After idxA++, the pointer will point to element A(i,j+1).
Similar, idxA += dimension will move the pointer to element A(i+1,j).
Thus, the code computes the matrix product C=A*B, while trying to
avoid explicit computation of matrix indices.

The respective algorithm in C++ is:

template<class MatElem>
void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
FortranMatrix<MatElem>& C)
const {
/* ... */
int idxC = 0;
for(int i=0; i< dimension; i++)
for(int k=0; k< dimension; k++) {
int idxA = i*dimension;
int endA = idxA + dimension;
int idxB = k;
while ( idxA < endA ) {
C.elem[idxC] += elem[idxA] * B.elem[idxB];
idxA++;
idxB += dimension;
};
idxC++;
};
}

Here, A, B, and C, are objects of some matrix class that have a
member elem (of type double*).

显示，不要告诉。 FortranMatrix的实际类声明在哪里

< MatElem>？什么_is_ MatElem？显示它的声明。

Show, don''t tell. Where is the actual class declaration of FortranMatrix
<MatElem>? What _is_ MatElem? Show it''s declaration too.

这篇关于性能问题：C和C ++中的矩阵乘法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

性能问题：C和C ++中的矩阵乘法 [英] Performance issue: matrix multiplication in C and C++

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

性能问题：C和C ++中的矩阵乘法 [英] Performance issue: matrix multiplication in C and C++

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭