与bsxfun相比,隐式扩展要快多少? [英] How much faster is implicit expansion compared with bsxfun?
问题描述
作为进行了评论由史蒂夫·爱丁斯, bsxfun
用于较小的数组,而对于大型数组具有相似的速度:
在R2016b中,在大多数情况下,隐式扩展的运行速度与bsxfun一样快或更快.隐式扩展的最佳性能提升是在较小的矩阵和数组大小的情况下进行的.对于较大的矩阵,隐式扩展的速度通常与
bsxfun
相同.
此外,展开所沿的维度可能会产生影响:
第一维扩展时,运算符可能不如
bsxfun
快.
自然会出现两个问题:
- 隐式扩展比
bsxfun
快多少? - 对于什么数组大小或形状,差异显着?
要测量速度差异,已进行了一些测试.测试考虑了两种不同的操作:
- 添加
- 力量
和四种不同形状的阵列:
-
N×N
数组与N×1
数组 -
N×N×N×N
数组与N×1×N
数组 -
N×N
数组与1×N
数组 -
N×N×N×N
数组与1×N×N
数组
对于操作和数组形状的八个组合中的每个组合,使用隐式扩展和bsxfun
进行相同的操作. 使用几个N
值来覆盖从小数组到大数组的范围. timeit
用于可靠计时.
该答案的末尾给出了基准代码.它已在具有12 GB RAM的Windows 10 Matlab R2016b上运行.
结果
下图显示了结果.横轴是输出数组的元素数,比N
更好.
测试也已经通过逻辑运算(而不是算术运算)进行了.为了简洁起见,这里没有显示结果,但是显示了类似的趋势.
结论
根据图表:
- 结果证实,小数组的隐式扩展速度更快,大数组的隐式扩展速度与
bsxfun
相似. - 至少在所考虑的情况下,沿第一个维度或其他维度进行扩展似乎并没有很大的影响.
- 对于小阵列,差异可以是十倍或更大.但是请注意,
timeit
对于小尺寸不正确,因为代码太快了(实际上,它对这种小尺寸发出警告). - 当输出的元素数量达到大约
1e5
时,两个速度相等.此值可能与系统有关.
由于仅当数组较小时,速度的改善才有意义,因此无论哪种方法都非常快,因此使用隐式扩展或bsxfun
似乎主要取决于口味,可读性或向后兼容性.
基准代码
clear
% NxN, Nx1, addition / power
N1 = 2.^(4:1:12);
t1_bsxfun_add = NaN(size(N1));
t1_implicit_add = NaN(size(N1));
t1_bsxfun_pow = NaN(size(N1));
t1_implicit_pow = NaN(size(N1));
for k = 1:numel(N1)
N = N1(k);
x = randn(N,N);
y = randn(N,1);
% y = randn(1,N); % use this line or the preceding one
t1_bsxfun_add(k) = timeit(@() bsxfun(@plus, x, y));
t1_implicit_add(k) = timeit(@() x+y);
t1_bsxfun_pow(k) = timeit(@() bsxfun(@power, x, y));
t1_implicit_pow(k) = timeit(@() x.^y);
end
% NxNxNxN, Nx1xN, addition / power
N2 = round(sqrt(N1));
t2_bsxfun_add = NaN(size(N2));
t2_implicit_add = NaN(size(N2));
t2_bsxfun_pow = NaN(size(N2));
t2_implicit_pow = NaN(size(N2));
for k = 1:numel(N1)
N = N2(k);
x = randn(N,N,N,N);
y = randn(N,1,N);
% y = randn(1,N,N); % use this line or the preceding one
t2_bsxfun_add(k) = timeit(@() bsxfun(@plus, x, y));
t2_implicit_add(k) = timeit(@() x+y);
t2_bsxfun_pow(k) = timeit(@() bsxfun(@power, x, y));
t2_implicit_pow(k) = timeit(@() x.^y);
end
% Plots
figure
colors = get(gca,'ColorOrder');
subplot(121)
title('N\times{}N, N\times{}1')
% title('N\times{}N, 1\times{}N') % this or the preceding
set(gca,'XScale', 'log', 'YScale', 'log')
hold on
grid on
loglog(N1.^2, t1_bsxfun_add, 's-', 'color', colors(1,:))
loglog(N1.^2, t1_implicit_add, 's-', 'color', colors(2,:))
loglog(N1.^2, t1_bsxfun_pow, '^-', 'color', colors(1,:))
loglog(N1.^2, t1_implicit_pow, '^-', 'color', colors(2,:))
legend('Addition, bsxfun', 'Addition, implicit', 'Power, bsxfun', 'Power, implicit')
subplot(122)
title('N\times{}N\times{}N{}\times{}N, N\times{}1\times{}N')
% title('N\times{}N\times{}N{}\times{}N, 1\times{}N\times{}N') % this or the preceding
set(gca,'XScale', 'log', 'YScale', 'log')
hold on
grid on
loglog(N2.^4, t2_bsxfun_add, 's-', 'color', colors(1,:))
loglog(N2.^4, t2_implicit_add, 's-', 'color', colors(2,:))
loglog(N2.^4, t2_bsxfun_pow, '^-', 'color', colors(1,:))
loglog(N2.^4, t2_implicit_pow, '^-', 'color', colors(2,:))
legend('Addition, bsxfun', 'Addition, implicit', 'Power, bsxfun', 'Power, implicit')
As commented by Steve Eddins, implicit expansion (introduced in Matlab R2016b) is faster than bsxfun
for small array sizes, and has similar speed for large arrays:
In R2016b, implicit expansion works as fast or faster than bsxfun in most cases. The best performance gains for implicit expansion are with small matrix and array sizes. For large matrix sizes, implicit expansion tends to be roughly the same speed as
bsxfun
.
Also, the dimension along which expansion takes place may have an influence:
When there is an expansion in the first dimension, the operators might not be quite as fast as
bsxfun
.
(Thanks to @Poelie and @rayryeng for letting me know about this!)
Two questions naturally arise:
- How much faster is implicit expansion compared with
bsxfun
? - For what array sizes or shapes is the difference significant?
To measure the difference in speed, some tests have been done. The tests consider two different operations:
- addition
- power
and four different shapes of the arrays to be operated on:
N×N
array withN×1
arrayN×N×N×N
array withN×1×N
arrayN×N
array with1×N
arrayN×N×N×N
array with1×N×N
array
For each of the eight combinations of operation and array shapes, the same operation is done with implicit expansion and with bsxfun
. Several values of N
are used, to cover the range from small to large arrays. timeit
is used for reliable timing.
The benchmarking code is given at the end of this answer. It has been run on Matlab R2016b, Windows 10, with 12 GB RAM.
Results
The following graphs show the results. The horizontal axis is the number of elements of the output array, which is a better measure of size than N
is.
Tests have also been done with logical operations (instead of arithmetical). The results are not displayed here for brevity, but show a similar trend.
Conclusions
According to the graphs:
- The results confirm that implicit expansion is faster for small arrays, and has a speed similar to
bsxfun
for large arrays. - Expanding along the first or along other dimensions doesn't seem to have a large influence, at least in the considered cases.
- For small arrays the difference can be of ten times or more. Note, however, that
timeit
is not accurate for small sizes because the code is too fast (in fact, it issues a warning for such small sizes). - The two speeds become equal when the number of elements of the output reaches about
1e5
. This value may be system-dependent.
Since the speed improvement is only significant when the arrays are small, which is a situation in which either approach is very fast anyway, using implicit expansion or bsxfun
seems to be mainly a matter of taste, readability, or backward compatibility.
Benchmarking code
clear
% NxN, Nx1, addition / power
N1 = 2.^(4:1:12);
t1_bsxfun_add = NaN(size(N1));
t1_implicit_add = NaN(size(N1));
t1_bsxfun_pow = NaN(size(N1));
t1_implicit_pow = NaN(size(N1));
for k = 1:numel(N1)
N = N1(k);
x = randn(N,N);
y = randn(N,1);
% y = randn(1,N); % use this line or the preceding one
t1_bsxfun_add(k) = timeit(@() bsxfun(@plus, x, y));
t1_implicit_add(k) = timeit(@() x+y);
t1_bsxfun_pow(k) = timeit(@() bsxfun(@power, x, y));
t1_implicit_pow(k) = timeit(@() x.^y);
end
% NxNxNxN, Nx1xN, addition / power
N2 = round(sqrt(N1));
t2_bsxfun_add = NaN(size(N2));
t2_implicit_add = NaN(size(N2));
t2_bsxfun_pow = NaN(size(N2));
t2_implicit_pow = NaN(size(N2));
for k = 1:numel(N1)
N = N2(k);
x = randn(N,N,N,N);
y = randn(N,1,N);
% y = randn(1,N,N); % use this line or the preceding one
t2_bsxfun_add(k) = timeit(@() bsxfun(@plus, x, y));
t2_implicit_add(k) = timeit(@() x+y);
t2_bsxfun_pow(k) = timeit(@() bsxfun(@power, x, y));
t2_implicit_pow(k) = timeit(@() x.^y);
end
% Plots
figure
colors = get(gca,'ColorOrder');
subplot(121)
title('N\times{}N, N\times{}1')
% title('N\times{}N, 1\times{}N') % this or the preceding
set(gca,'XScale', 'log', 'YScale', 'log')
hold on
grid on
loglog(N1.^2, t1_bsxfun_add, 's-', 'color', colors(1,:))
loglog(N1.^2, t1_implicit_add, 's-', 'color', colors(2,:))
loglog(N1.^2, t1_bsxfun_pow, '^-', 'color', colors(1,:))
loglog(N1.^2, t1_implicit_pow, '^-', 'color', colors(2,:))
legend('Addition, bsxfun', 'Addition, implicit', 'Power, bsxfun', 'Power, implicit')
subplot(122)
title('N\times{}N\times{}N{}\times{}N, N\times{}1\times{}N')
% title('N\times{}N\times{}N{}\times{}N, 1\times{}N\times{}N') % this or the preceding
set(gca,'XScale', 'log', 'YScale', 'log')
hold on
grid on
loglog(N2.^4, t2_bsxfun_add, 's-', 'color', colors(1,:))
loglog(N2.^4, t2_implicit_add, 's-', 'color', colors(2,:))
loglog(N2.^4, t2_bsxfun_pow, '^-', 'color', colors(1,:))
loglog(N2.^4, t2_implicit_pow, '^-', 'color', colors(2,:))
legend('Addition, bsxfun', 'Addition, implicit', 'Power, bsxfun', 'Power, implicit')
这篇关于与bsxfun相比,隐式扩展要快多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!