慢两个逻辑变量混合? [英] slower to mix logical variables with double?
问题描述
做一个逻辑向量和一个双重矩阵的矩阵乘法比使用两个都慢吗?看到我的初步结果如下:
>> x = [0 1 0 1 0 1 0 1]; A = rand(numel(x)); xl =逻辑(x);
>>抽动;对于k = 1:10000; x * A * x';结束; toc%'
已用时间为0.017682秒。
>>抽动;对于k = 1:10000; xl * A * xl';结束; toc%'
已用时间为0.026810秒。
>> xs =稀疏(x);
>>抽动;对于k = 1:10000; xs * A * xs';结束; toc%'
已用时间为0.039566秒。
似乎使用逻辑表示的速度要慢得多(而稀疏速度更慢)。有人能解释为什么吗?它是类型转换时间吗?这是CPU / FPU指令集的限制吗?编辑:我的系统是在Mac OS X 10.8.3,英特尔酷睿i7 3.4 GHz的MATLAB R2012b
编辑2:一些评论显示,这只是Mac OS X的一个问题。如果可能的话,我想编译不同的体系结构和操作系统的结果。
$ b
编辑3:我的实际问题需要计算长度 m
的所有可能的二进制向量的大部分,其中 m 对于
8 * m * 2 ^ m
来说可能太大以适应内存。
我会先发布一个稍微好一点的基准。我使用Steve Eddins的 TIMEIT 函数来获得更多准确的时间点:$ b
$ b pre code $ function $ $ $ $ $ $ $ $ $ $ $ $ 4000;稀疏度= 0.7; %#调整数据的大小和稀疏性
x = double(rand(1,N)>稀疏);
xl =逻辑(x);
xs = sparse(x);
A = randn(N);
%#functions
f = cell(3,1);
f {1} = @()mult_func(x,A);
f {2} = @()mult_func(xl,A);
f {3} = @()mult_func(xs,A);
%#timeit
t = cellfun(@timeit,f);
%#检查结果
v = cellfun(@feval,f,'UniformOutput',true);
err = max(abs(v-mean(v))); %#最大误差
结束
函数v = mult_func(x,A)
v = x * A * x';
end
以下是我的机器(WinXP 32位,R2013a) N = 4000,稀疏度= 0.7:
>> [t,err] = test_mat_mult
t =
0.031212%#double
0.031970%#逻辑
0.071998%#稀疏
err =
7.9581e-13
您可以看到 double
比逻辑
平均要少,而稀疏
比预期的要慢(因为它的焦点是高效的内存使用率而不是速度) 。
现在请注意,MATLAB 依赖于BLAS实现,该实现已针对您的平台进行了优化,以执行全矩阵乘法(想想 DGEMM
)。在一般情况下,这包括单/双类型的例程,但不包括布尔类型,所以转换将会发生,这将解释为什么它对于逻辑
较慢。
在Intel处理器上,BLAS / LAPACK例程由英特尔MKL库。不确定AMD,但我认为它使用相同的 ACML :
>> internal.matlab.language.versionPlugins.blas
ans =
英特尔(R)Math内核函数库版本10.3.11适用于32位应用程序的产品版本20120606
当然稀疏的情况是不同的。 (我知道MATLAB使用 SuiteSparse 软件包进行许多稀疏操作,但是我不知道)。
I have 0-1 valued vectors that I need to do some matrix operations on. They are not very sparse (only half of the values are 0) but saving them as a logical variable instead of double saves 8 times the memory: 1 byte for a logical, and 8 for double floating point.
Would it be any slower to do matrix multiplications of a logical vector and a double matrix than to use both as double? See my preliminary results below:
>> x = [0 1 0 1 0 1 0 1]; A = rand(numel(x)); xl = logical(x);
>> tic; for k = 1:10000; x * A * x'; end; toc %'
Elapsed time is 0.017682 seconds.
>> tic; for k = 1:10000; xl * A * xl'; end; toc %'
Elapsed time is 0.026810 seconds.
>> xs = sparse(x);
>> tic; for k = 1:10000; xs * A * xs'; end; toc %'
Elapsed time is 0.039566 seconds.
It seems that using logical representation is much slower (and sparse is even slower). Can someone explain why? Is it type casting time? Is it a limitation of the CPU/FPU instruction set?
EDIT: My system is MATLAB R2012b on Mac OS X 10.8.3 , Intel Core i7 3.4 GHz
EDIT2: A few comments show that on this is only a problem with Mac OS X. I would like to compile results from diverse architectures and OS if possible.
EDIT3: My actual problem requires computation with a huge portion of all possible binary vectors of length m
, where m
can be too large for 8 * m * 2^m
to fit in memory.
I'll start by posting a slightly better benchmark. I'm using the TIMEIT function from Steve Eddins to get more accurate timings:
function [t,err] = test_mat_mult()
%# data
N = 4000; sparsity = 0.7; %# adjust size and sparsity of data
x = double(rand(1,N) > sparsity);
xl = logical(x);
xs = sparse(x);
A = randn(N);
%# functions
f = cell(3,1);
f{1} = @() mult_func(x,A);
f{2} = @() mult_func(xl,A);
f{3} = @() mult_func(xs,A);
%# timeit
t = cellfun(@timeit, f);
%# check results
v = cellfun(@feval, f, 'UniformOutput',true);
err = max(abs(v-mean(v))); %# maximum error
end
function v = mult_func(x,A)
v = x * A * x';
end
Here are the results on my machine (WinXP 32-bit, R2013a) with N=4000 and sparsity=0.7:
>> [t,err] = test_mat_mult
t =
0.031212 %# double
0.031970 %# logical
0.071998 %# sparse
err =
7.9581e-13
You can see double
is only slightly better than logical
on average, while sparse
is slower than both as expected (since its focus is efficient memory usage not speed).
Now note that that MATLAB relies on BLAS implementations optimized for your platform to perform full-matrix multiplication (think DGEMM
). In the general case, this includes routines for single/double types but not booleans, so conversion will occur which would explain why its slower for logical
.
On Intel processors, BLAS/LAPACK routines are provided by the Intel MKL Library. Not sure about AMD, but I think it uses the equivalent ACML:
>> internal.matlab.language.versionPlugins.blas
ans =
Intel(R) Math Kernel Library Version 10.3.11 Product Build 20120606 for 32-bit applications
Of course the sparse case is a different story. (I know MATLAB uses SuiteSparse package for many of its sparse operations, but I'm not sure).
这篇关于慢两个逻辑变量混合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!