如何在MATLAB中编写矢量化函数 [英] How to write vectorized functions in MATLAB

查看:128
本文介绍了如何在MATLAB中编写矢量化函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是在学习MATLAB,我很难理解循环与向量化函数的性能因素。 在我之前的问题:嵌套for循环在MATLAB中非常缓慢(preallocated)我意识到,使用矢量化函数与4个嵌套循环做了一个<在这个例子中,不是循环遍历一个4维数组的所有维数,而是计算每个向量的中值,更简洁,更快,只需要调用中位数(stack,n),其中n代表中位数函数的工作维数。

但是中值只是一个非常简单的例子,我很幸运 尺寸参数已经实现



我的问题是 你如何自己写一个函数,这个函数和实现这个维度范围的函数一样高效。 my_median_1D ,它只能用于一维向量并返回一个数字。

如何编写一个函数 my_median_nD ,它的行为就像MATLAB的中位数,取一个n维数组和一个 working dimension参数?
$ b 更新 >我发现用于计算中位数的代码在更高维中

 %在所有其他情况下,使用线性索引来确定精确位置
%的中位数。使用线性索引来提取中位数,然后在
%end重塑为适当的大小。
cumSize = cumprod(s);
total = cumSize(end); %相当于NUMEL(x)
numMedians = total / nCompare;

numConseq = cumSize(dim - 1); %连续索引数
increment = cumSize(dim);运行指数之间的差距
ixMedians = 1;

y = repmat(x(1),numMedians,1); %预先分配适当的类型

%嵌套的FOR循环通过其索引追踪中值。
for seqIndex = 1:increment:total
for consIndex = half * numConseq:(half + 1)* numConseq-1 $ b $ absIndex = seqIndex + consIndex;
y(ixMedians)= x(absIndex);
ixMedians = ixMedians + 1;
end
end

第二个值的平均值,如果n是偶数
if 2 * half == nCompare
ixMedians = 1;
for seqIndex = 1:increment:total
for consIndex =(half-1)* numConseq:half * numConseq-1
absIndex = seqIndex + consIndex;
y(ixMedians)= meanof(x(absIndex),y(ixMedians));
ixMedians = ixMedians + 1;
结束
结束
结束

%检查最后的指数NaN
ixMedians = 1;
for seqIndex = 1:increment:total
for consIndex =(nCompare-1)* numConseq:nCompare * numConseq-1 $ b $ absIndex = seqIndex + consIndex;
如果isnan(x(absIndex))
y(ixMedians)= NaN;
end
ixMedians = ixMedians + 1;
end
end

您可以向我解释为什么这个代码与简单的嵌套循环相比如此有效?它具有与其他函数一样的嵌套循环。



我不明白它怎么会快7倍?为什么它很复杂

更新2

我意识到使用中位数并不是一个好例子因为它本身就是一个复杂的函数,需要对数组或其他整洁的技巧进行排序。我重新做了测试,结果是更疯狂:
19秒vs 0.12秒。

这意味着内置的方式为sum比嵌套循环快160倍


我很难理解一个行业领先的语言如何在编程风格上有这么一个极端的性能差异,但是我看到了答案中提到的几点

解决方案

更新2

MATLAB经过优化,可以很好地处理数组。一旦你习惯了,实际上真的很好,只需要输入一行,并让MATLAB自己完成四维循环,而不必担心它。 MATLAB经常用于原型/一次性计算,所以为编程人员节省时间是有意义的,并且放弃了C [++ |#]的一些灵活性。

这就是为什么 MATLAB内部很好循环 - 通常将其编译为一个编译函数。



您提供的代码段并不包含执行主要工作的相关代码行,即

 %按给定的维度排序
x = sort(x,dim);换句话说,你所显示的代码只需要通过它们正确的索引访问中位数值就可以了现在已排序的多维数组 x (这不需要太多时间)。访问所有数组元素的实际工作由 sort 完成,这是一个内置的(即编译和高度优化的)函数。



原始答案 (关于如何构建自己的数组快速函数)



实际上是相当多的内置维度参数: min(stack,[],n) max(stack,[], n) mean(stack,n) std(stack,[],n) median(stack,n) sum(stack,n)其他内置函数(如 exp() sin()会自动处理整个数组的每个元素如果 stack 是4D), sin(stack)会自动为你做四个嵌套循环,你可能需要的很多功能只是依靠现有的内置功能



如果这对某个特定的情况是不够的,你应该有一个看在 repmat bsxfun arrayfun accumarray 这是非常强大的处理MATLAB方式的功能。只是搜索SO的问题(或者更确切地说,答案)使用 一个 这些,我学到了很多关于MATLAB的强项。



作为示例,假设您要实现
>

 函数结果= pnorm(stack,p,n)
result = sum(stack。^ p,n)^(1 / p)

...您可以有效地重用 sum 。



更新

在评论中指出,还可以看看冒号(这是一个非常强大的工具,用于从数组中选择元素(或者甚至改变它的形状,这通常是通过 reshape )。



一般来说,看看数组操作帮助 - 它包含 repmat 等。上面提到的,还有 cumsum 和一些比较隐晦的助手函数,你应该把它们用作构建块。


I am just learning MATLAB and I find it hard to understand the performance factors of loops vs vectorized functions.

In my previous question: Nested for loops extremely slow in MATLAB (preallocated) I realized that using a vectorized function vs. 4 nested loops made a 7x times difference in running time.

In that example instead of looping through all dimensions of a 4 dimensional array and calculating median for each vector, it was much cleaner and faster to just call median(stack, n) where n meant the working dimension of the median function.

But median is just a very easy example and I was just lucky that it had this dimension parameter implemented.

My question is that how do you write a function yourself which works as efficiently as one which has this dimension range implemented?

For example you have a function my_median_1D which only works on a 1-D vector and returns a number.

How do you write a function my_median_nD which acts like MATLAB's median, by taking an n-dimensional array and a "working dimension" parameter?

Update

I found the code for calculating median in higher dimensions

% In all other cases, use linear indexing to determine exact location
% of medians.  Use linear indices to extract medians, then reshape at
% end to appropriate size.
cumSize = cumprod(s);
total = cumSize(end);            % Equivalent to NUMEL(x)
numMedians = total / nCompare;

numConseq = cumSize(dim - 1);    % Number of consecutive indices
increment = cumSize(dim);        % Gap between runs of indices
ixMedians = 1;

y = repmat(x(1),numMedians,1);   % Preallocate appropriate type

% Nested FOR loop tracks down medians by their indices.
for seqIndex = 1:increment:total
  for consIndex = half*numConseq:(half+1)*numConseq-1
    absIndex = seqIndex + consIndex;
    y(ixMedians) = x(absIndex);
    ixMedians = ixMedians + 1;
  end
end

% Average in second value if n is even
if 2*half == nCompare
  ixMedians = 1;
  for seqIndex = 1:increment:total
    for consIndex = (half-1)*numConseq:half*numConseq-1
      absIndex = seqIndex + consIndex;
      y(ixMedians) = meanof(x(absIndex),y(ixMedians));
      ixMedians = ixMedians + 1;
    end
  end
end

% Check last indices for NaN
ixMedians = 1;
for seqIndex = 1:increment:total
  for consIndex = (nCompare-1)*numConseq:nCompare*numConseq-1
    absIndex = seqIndex + consIndex;
    if isnan(x(absIndex))
      y(ixMedians) = NaN;
    end
    ixMedians = ixMedians + 1;
  end
end

Could you explain to me that why is this code so effective compared to the simple nested loops? It has nested loops just like the other function.

I don't understand how could it be 7x times faster and also, that why is it so complicated.

Update 2

I realized that using median was not a good example as it is a complicated function itself requiring sorting of the array or other neat tricks. I re-did the tests with mean instead and the results are even more crazy: 19 seconds vs 0.12 seconds. It means that the built in way for sum is 160 times faster than the nested loops.

It is really hard for me to understand how can an industry leading language have such an extreme performance difference based on the programming style, but I see the points mentioned in the answers below.

解决方案

Update 2 (to address your updated question)

MATLAB is optimized to work well with arrays. Once you get used to it, it is actually really nice to just have to type one line and have MATLAB do the full 4D looping stuff itself without having to worry about it. MATLAB is often used for prototyping / one-off calculations, so it makes sense to save time for the person coding, and giving up some of C[++|#]'s flexibility.

This is why MATLAB internally does some loops really well - often by coding them as a compiled function.

The code snippet you give doesn't really contain the relevant line of code which does the main work, namely

% Sort along given dimension
x = sort(x,dim);

In other words, the code you show only needs to access the median values by their correct index in the now-sorted multi-dimensional array x (which doesn't take much time). The actual work accessing all array elements was done by sort, which is a built-in (i.e. compiled and highly optimized) function.

Original answer (about how to built your own fast functions working on arrays)

There are actually quite a few built-ins that take a dimension parameter: min(stack, [], n), max(stack, [], n), mean(stack, n), std(stack, [], n), median(stack,n), sum(stack, n)... together with the fact that other built-in functions like exp(), sin() automatically work on each element of your whole array (i.e. sin(stack) automatically does four nested loops for you if stack is 4D), you can built up a lot of functions that you might need just be relying on the existing built-ins.

If this is not enough for a particular case you should have a look at repmat, bsxfun, arrayfun and accumarray which are very powerful functions for doing things "the MATLAB way". Just search on SO for questions (or rather answers) using one of these, I learned a lot about MATLABs strong points that way.

As an example, say you wanted to implement the p-norm of stack along dimension n, you could write

function result=pnorm(stack, p, n)
result=sum(stack.^p,n)^(1/p);

... where you effectively reuse the "which-dimension-capability" of sum.

Update

As Max points out in the comments, also have a look at the colon operator (:) which is a very powerful tool for selecting elements from an array (or even changing it shape, which is more generally done with reshape).

In general, have a look at the section Array Operations in the help - it contains repmat et al. mentioned above, but also cumsum and some more obscure helper functions which you should use as building blocks.

这篇关于如何在MATLAB中编写矢量化函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆