通过空矩阵乘法初始化数组的更快方法?(MATLAB) [英] Faster way to initialize arrays via empty matrix multiplication? (Matlab)

查看:45
本文介绍了通过空矩阵乘法初始化数组的更快方法?(MATLAB)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我偶然发现了 Matlab 处理的奇怪方式(在我看来)空矩阵.例如,如果两个空矩阵相乘,结果是:

I've stumbled upon the weird way (in my view) that Matlab is dealing with empty matrices. For example, if two empty matrices are multiplied the result is:

zeros(3,0)*zeros(0,3)
ans =

 0     0     0
 0     0     0
 0     0     0

现在,这已经让我感到惊讶,但是,快速搜索让我找到了上面的链接,我得到了关于为什么会发生这种有点扭曲的逻辑的解释.

Now, this already took me by surprise, however, a quick search got me to the link above, and I got an explanation of the somewhat twisted logic of why this is happening.

然而,我没有为接下来的观察做好准备.我问自己,这种类型的乘法与仅使用 zeros(n) 函数相比效率如何,比如说为了初始化?我用 timeit 来回答这个:

However, nothing prepared me for the following observation. I asked myself, how efficient is this type of multiplication vs just using zeros(n) function, say for the purpose of initialization? I've used timeit to answer this:

f=@() zeros(1000)
timeit(f)
ans =
    0.0033

对比:

g=@() zeros(1000,0)*zeros(0,1000)
timeit(g)
ans =
    9.2048e-06

两者都具有 double 类的 1000x1000 零矩阵的相同结果,但空矩阵乘法 1 的速度要快约 350 倍!(使用 tictoc 和一个循环会发生类似的结果)

Both have the same outcome of 1000x1000 matrix of zeros of class double, but the empty matrix multiplication one is ~350 times faster! (a similar result happens using tic and toc and a loop)

这怎么可能?timeittic,toc 是在虚张声势还是我找到了一种更快的方法来初始化矩阵?(这是用 matlab 2012a,在 win7-64 机器上完成的,intel-i5 650 3.2Ghz...)

How can this be? are timeit or tic,toc bluffing or have I found a faster way to initialize matrices? (this was done with matlab 2012a, on a win7-64 machine, intel-i5 650 3.2Ghz...)

阅读您的反馈后,我更仔细地研究了这种特性,并在 2 台不同的计算机(相同的 matlab 版本,尽管 2012a)上测试了检查运行时间与矩阵 n 大小的代码.这就是我得到的:

After reading your feedback, I have looked more carefully into this peculiarity, and tested on 2 different computers (same matlab ver though 2012a) a code that examine the run time vs the size of matrix n. This is what I get:

生成这个的代码和以前一样使用了 timeit,但是带有 tictoc 的循环看起来是一样的.因此,对于小尺寸,zeros(n) 是可比的.但是,在 n=400 附近,空矩阵乘法的性能有了飞跃.我用来生成该图的代码是:

The code to generate this used timeit as before, but a loop with tic and toc will look the same. So, for small sizes, zeros(n) is comparable. However, around n=400 there is a jump in performance for the empty matrix multiplication. The code I've used to generate that plot was:

n=unique(round(logspace(0,4,200)));
for k=1:length(n)
    f=@() zeros(n(k));
    t1(k)=timeit(f);

    g=@() zeros(n(k),0)*zeros(0,n(k));
    t2(k)=timeit(g);
end

loglog(n,t1,'b',n,t2,'r');
legend('zeros(n)','zeros(n,0)*zeros(0,n)',2);
xlabel('matrix size (n)'); ylabel('time [sec]');

你们也有这种情况吗?

编辑 #2:

顺便说一下,不需要空矩阵乘法来获得这种效果.一个可以简单地做:

Incidentally, empty matrix multiplication is not needed to get this effect. One can simply do:

z(n,n)=0;

where n> 一些阈值矩阵大小在上图中看到,并获得 精确 效率曲线与空矩阵乘法(再次使用 timeit).

where n> some threshold matrix size seen in the previous graph, and get the exact efficiency profile as with empty matrix multiplication (again using timeit).

这是一个提高代码效率的例子:

Here's an example where it improves efficiency of a code:

n = 1e4;
clear z1
tic
z1 = zeros( n ); 
for cc = 1 : n
    z1(:,cc)=cc;
end
toc % Elapsed time is 0.445780 seconds.

%%
clear z0
tic
z0 = zeros(n,0)*zeros(0,n);
for cc = 1 : n
    z0(:,cc)=cc;
end
toc % Elapsed time is 0.297953 seconds.

然而,使用 z(n,n)=0; 会产生与 zeros(n) 情况类似的结果.

However, using z(n,n)=0; instead yields similar results to the zeros(n) case.

推荐答案

这很奇怪,我看到 f 比您看到的要快,而 g 比您看到的要慢.但它们对我来说都是相同的.也许是不同版本的 MATLAB ?

This is strange, I am seeing f being faster while g being slower than what you are seeing. But both of them are identical for me. Perhaps a different version of MATLAB ?

>> g = @() zeros(1000, 0) * zeros(0, 1000);
>> f = @() zeros(1000)
f =     
    @()zeros(1000)
>> timeit(f)  
ans =    
   8.5019e-04
>> timeit(f)  
ans =    
   8.4627e-04
>> timeit(g)  
ans =    
   8.4627e-04

编辑你能不能在 f 和 g 的末尾加上 + 1,看看你得到的时间是多少.

EDIT can you add + 1 for the end of f and g, and see what times you are getting.

EDIT Jan 6, 2013, 7:42 EST

我正在远程使用机器,对于低质量的图表(不得不盲目生成)感到抱歉.

I am using a machine remotely, so sorry about the low quality graphs (had to generate them blind).

机器配置:

i7 920.2.653 GHz.Linux.12 GB 内存.8MB 缓存.

i7 920. 2.653 GHz. Linux. 12 GB RAM. 8MB cache.

看起来甚至我可以访问的机器也显示出这种行为,除了更大的尺寸(1979 年到 2073 年之间).我现在没有理由认为空矩阵乘法在更大的尺寸下更快.

It looks like even the machine I have access to shows this behavior, except at a larger size (somewhere between 1979 and 2073). There is no reason I can think of right now for the empty matrix multiplication to be faster at larger sizes.

我会在回来之前做更多调查.

I will be investigating a little bit more before coming back.

编辑 2013 年 1 月 11 日

在@EitanT 发帖后,我想做更多的挖掘.我编写了一些 C 代码来查看 matlab 如何创建零矩阵.这是我使用的 C++ 代码.

After @EitanT's post, I wanted to do a little bit more of digging. I wrote some C code to see how matlab may be creating a zeros matrix. Here is the c++ code that I used.

int main(int argc, char **argv)
{
    for (int i = 1975; i <= 2100; i+=25) {
    timer::start();
    double *foo = (double *)malloc(i * i * sizeof(double));
    for (int k = 0; k < i * i; k++) foo[k]  = 0;
    double mftime = timer::stop();
    free(foo);

    timer::start();
    double *bar = (double *)malloc(i * i * sizeof(double));
    memset(bar, 0, i * i * sizeof(double));
    double mmtime = timer::stop();
    free(bar);

    timer::start();
    double *baz = (double *)calloc(i * i, sizeof(double));
    double catime = timer::stop();
    free(baz);

    printf("%d, %lf, %lf, %lf
", i, mftime, mmtime, catime);
    }
}

这是结果.

$ ./test
1975, 0.013812, 0.013578, 0.003321
2000, 0.014144, 0.013879, 0.003408
2025, 0.014396, 0.014219, 0.003490
2050, 0.014732, 0.013784, 0.000043
2075, 0.015022, 0.014122, 0.000045
2100, 0.014606, 0.014480, 0.000045

如您所见,calloc(第 4 列)似乎是最快的方法.在 2025 年到 2050 年之间,它的速度也明显加快(我认为它会在 2048 年左右?).

As you can see calloc (4th column) seems to be the fastest method. It is also getting significantly faster between 2025 and 2050 (I'd assume it would at around 2048 ?).

现在我回到 matlab 来检查相同的情况.这是结果.

Now I went back to matlab to check for the same. Here are the results.

>> test
1975, 0.003296, 0.003297
2000, 0.003377, 0.003385
2025, 0.003465, 0.003464
2050, 0.015987, 0.000019
2075, 0.016373, 0.000019
2100, 0.016762, 0.000020

看起来 f() 和 g() 都在使用较小尺寸的 calloc (<2048 ?).但是在更大的尺寸 f() (zeros(m, n)) 开始使用 malloc + memset,而 g() (zeros(m, 0) * zeros(0, n)) 继续使用 calloc.

It looks like both f() and g() are using calloc at smaller sizes (<2048 ?). But at larger sizes f() (zeros(m, n)) starts to use malloc + memset, while g() (zeros(m, 0) * zeros(0, n)) keeps using calloc.

所以分歧解释如下

  • zeros(..) 开始在更大的尺寸上使用不同的(更慢的?)方案.
  • calloc 的行为也有些出乎意料,从而提高了性能.
  • zeros(..) begins to use a different (slower ?) scheme at larger sizes.
  • calloc also behaves somewhat unexpectedly, leading to an improvement in performance.

这是 Linux 上的行为.有人可以在不同的机器(可能还有不同的操作系统)上做同样的实验,看看实验是否成立?

This is the behavior on Linux. Can someone do the same experiment on a different machine (and perhaps a different OS) and see if the experiment holds ?

这篇关于通过空矩阵乘法初始化数组的更快方法?(MATLAB)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆