慢两个逻辑变量混合? [英] slower to mix logical variables with double?

查看:273
本文介绍了慢两个逻辑变量混合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有0-1的值向量,我需要做一些矩阵操作。它们不是非常稀疏(只有一半的值为0),而是将它们保存为逻辑变量而不是双精度,这样可以节省8倍的内存空间:1个字节表示逻辑,8个表示双精度浮点。

做一个逻辑向量和一个双重矩阵的矩阵乘法比使用两个都慢吗?看到我的初步结果如下:

 >> x = [0 1 0 1 0 1 0 1]; A = rand(numel(x)); xl =逻辑(x); 
>>抽动;对于k = 1:10000; x * A * x';结束; toc%'
已用时间为0.017682秒。
>>抽动;对于k = 1:10000; xl * A * xl';结束; toc%'
已用时间为0.026810秒。
>> xs =稀疏(x);
>>抽动;对于k = 1:10000; xs * A * xs';结束; toc%'
已用时间为0.039566秒。

似乎使用逻辑表示的速度要慢得多(而稀疏速度更慢)。有人能解释为什么吗?它是类型转换时间吗?这是CPU / FPU指令集的限制吗?编辑:我的系统是在Mac OS X 10.8.3,英特尔酷睿i7 3.4 GHz的MATLAB R2012b



编辑2:一些评论显示,这只是Mac OS X的一个问题。如果可能的话,我想编译不同的体系结构和操作系统的结果。
$ b

编辑3:我的实际问题需要计算长度 m 的所有可能的二进制向量的大部分,其中 m 对于 8 * m * 2 ^ m 来说可能太大以适应内存。

解决方案

我会先发布一个稍微好一点的基准。我使用Steve Eddins的 TIMEIT 函数来获得更多准确的时间点:$ b​​
$ b pre code $ function $ $ $ $ $ $ $ $ $ $ $ $ 4000;稀疏度= 0.7; %#调整数据的大小和稀疏性
x = double(rand(1,N)>稀疏);
xl =逻辑(x);
xs = sparse(x);
A = randn(N);

%#functions
f = cell(3,1);
f {1} = @()mult_func(x,A);
f {2} = @()mult_func(xl,A);
f {3} = @()mult_func(xs,A);

%#timeit
t = cellfun(@timeit,f);

%#检查结果
v = cellfun(@feval,f,'UniformOutput',true);
err = max(abs(v-mean(v))); %#最大误差
结束

函数v = mult_func(x,A)
v = x * A * x';
end

以下是我的机器(WinXP 32位,R2013a) N = 4000,稀疏度= 0.7:

 >> [t,err] = test_mat_mult 
t =
0.031212%#double
0.031970%#逻辑
0.071998%#稀疏
err =
7.9581e-13

您可以看到 double 逻辑平均要少,而稀疏比预期的要慢(因为它的焦点是高效的内存使用率而不是速度) 。




现在请注意,MATLAB 依赖于BLAS实现,该实现已针对您的平台进行了优化,以执行全矩阵乘法(想想 DGEMM )。在一般情况下,这包括单/双类型的例程,但不包括布尔类型,所以转换将会发生,这将解释为什么它对于逻辑较慢。

在Intel处理器上,BLAS / LAPACK例程由英特尔MKL库。不确定AMD,但我认为它使用相同的 ACML

 >> internal.matlab.language.versionPlugins.blas 
ans =
英特尔(R)Math内核函数库版本10.3.11适用于32位应用程序的产品版本20120606

当然稀疏的情况是不同的。 (我知道MATLAB使用 SuiteSparse 软件包进行许多稀疏操作,但是我不知道)。


I have 0-1 valued vectors that I need to do some matrix operations on. They are not very sparse (only half of the values are 0) but saving them as a logical variable instead of double saves 8 times the memory: 1 byte for a logical, and 8 for double floating point.

Would it be any slower to do matrix multiplications of a logical vector and a double matrix than to use both as double? See my preliminary results below:

>> x = [0 1 0 1 0 1 0 1]; A = rand(numel(x)); xl = logical(x);
>> tic; for k = 1:10000; x * A * x'; end; toc %'
Elapsed time is 0.017682 seconds.
>> tic; for k = 1:10000; xl * A * xl'; end; toc %'
Elapsed time is 0.026810 seconds.
>> xs = sparse(x);
>> tic; for k = 1:10000; xs * A * xs'; end; toc %'
Elapsed time is 0.039566 seconds.

It seems that using logical representation is much slower (and sparse is even slower). Can someone explain why? Is it type casting time? Is it a limitation of the CPU/FPU instruction set?

EDIT: My system is MATLAB R2012b on Mac OS X 10.8.3 , Intel Core i7 3.4 GHz

EDIT2: A few comments show that on this is only a problem with Mac OS X. I would like to compile results from diverse architectures and OS if possible.

EDIT3: My actual problem requires computation with a huge portion of all possible binary vectors of length m, where m can be too large for 8 * m * 2^m to fit in memory.

解决方案

I'll start by posting a slightly better benchmark. I'm using the TIMEIT function from Steve Eddins to get more accurate timings:

function [t,err] = test_mat_mult()
    %# data
    N = 4000; sparsity = 0.7;    %# adjust size and sparsity of data
    x = double(rand(1,N) > sparsity);
    xl = logical(x);
    xs = sparse(x);
    A = randn(N);

    %# functions
    f = cell(3,1);
    f{1} = @() mult_func(x,A);
    f{2} = @() mult_func(xl,A);
    f{3} = @() mult_func(xs,A);

    %# timeit
    t = cellfun(@timeit, f);

    %# check results
    v = cellfun(@feval, f, 'UniformOutput',true);
    err = max(abs(v-mean(v)));  %# maximum error
end

function v = mult_func(x,A)
    v = x * A * x';
end

Here are the results on my machine (WinXP 32-bit, R2013a) with N=4000 and sparsity=0.7:

>> [t,err] = test_mat_mult
t =
     0.031212    %# double
     0.031970    %# logical
     0.071998    %# sparse
err =
   7.9581e-13

You can see double is only slightly better than logical on average, while sparse is slower than both as expected (since its focus is efficient memory usage not speed).


Now note that that MATLAB relies on BLAS implementations optimized for your platform to perform full-matrix multiplication (think DGEMM). In the general case, this includes routines for single/double types but not booleans, so conversion will occur which would explain why its slower for logical.

On Intel processors, BLAS/LAPACK routines are provided by the Intel MKL Library. Not sure about AMD, but I think it uses the equivalent ACML:

>> internal.matlab.language.versionPlugins.blas
ans =
Intel(R) Math Kernel Library Version 10.3.11 Product Build 20120606 for 32-bit applications

Of course the sparse case is a different story. (I know MATLAB uses SuiteSparse package for many of its sparse operations, but I'm not sure).

这篇关于慢两个逻辑变量混合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆