`accumarray`对其函数参数进行异常调用 [英] `accumarray` makes anomalous calls to its function argument

查看:94
本文介绍了`accumarray`对其函数参数进行异常调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简短版本:

作为第四个参数传递给 accumarray 的函数有时会被称为不一致与规范编码为accumarray的第一个参数的.

结果,用作accumarray的参数的函数必须测试实际上是什么异常条件.

问题是:1-表达式匿名函数如何测试这种异常情况?更笼统地说:如何编写对accumarray的未记录行为更强健的匿名函数?


完整版本:

下面的代码是该问题的彻底提炼版本,它使我今天的大部分工作都吃尽了.

首先是一些定义:

idxs = [1:3 1:3 1:3]';

vals0 = [1   4 6   3 5 7   6 Inf 2]';
vals1 = [1 Inf 6   3 5 7   6   4 2]';

anon = @(x) max(x(~isinf(x)));

注意vals1是通过交换元素2和8从vals0获得的.匿名"函数anon计算其输入的非无限元素中的最大值.

鉴于这些定义,下面的两个调用

accumarray(idxs, vals0, [], anon)
accumarray(idxs, vals1, [], anon)

仅在第二个自变量(vals0vals1)不同的

应该产生相同的结果,因为vals0vals1之间的差异仅影响自变量中值对其中一个的排序.调用anon,此函数的结果对参数中元素的顺序不敏感.

事实证明,这两个表达式中的第一个表达式正常求值并产生正确的结果 1 :

>> accumarray(idxs, vals0, [], anon)
ans =
     6
     5
     7

但是,第二个失败:

>> accumarray(idxs, vals1, [], anon)
Error using accumarray
The function '@(x)max(x(~isinf(x)))' returned a non-scalar value.

要解决此问题,我只能想出 2 就是编写一个单独的函数(当然,在其自己的文件中为"MATLAB方式")

function out = kluge(x)
    global ncalls;
    ncalls = ncalls + 1;
    y = ~isinf(x);
    if any(y)
        out = max(x(y));
    else
        {ncalls x}
        out = NaN;
    end
end

...并运行以下内容:

>> global ncalls;
>> ncalls = int8(0); accumarray(idxs, vals0, [], @kluge)
ans =
     6
     5
     7
>> ncalls = int8(0); accumarray(idxs, vals1, [], @kluge)
ans = 
    [2]    [Inf]

ans =
     6
     5
     7

从上面对accumarray的最后一次调用的输出中可以看到,对kluge回调的第二次调用的参数是数组[Int].毫无疑问,这告诉我accumarray的行为与记录的 3 相同(因为idxs没有指定要传递给accumarray的函数参数的长度为1的数组).

实际上,从本次测试和其他测试中,我确定,与我的预期相反,传递给accumarray的函数被调用了max(idxs)次(= 3次);在涉及kluge的表达式中,它被称为5次.

这里的问题是,如果不能依靠accumarray的函数参数的实际调用方式,那么使该函数参数更可靠的唯一方法是在其中包含许多额外的代码来执行必要的操作检查.几乎可以肯定,这将要求该函数具有多个语句,以排除匿名函数. (例如,上面的函数klugeanon健壮,但我不知道如何适合匿名函数.)无法与accumarray一起使用匿名函数会大大降低其实用性.

所以我的问题是:

如何指定可以成为accumarray鲁棒参数的匿名函数?


1 在这篇文章中显示的所有MATLAB输出中,我从MATLAB的典型填充中删除了空白行.
2 我欢迎您发表评论以及提出其他任何疑难解答建议;对这个问题进行故障排除比原本要困难得多.
3 特别要注意的是,在函数按如下所示处理输入"行之后的第1至5号..

解决方案

简短答案

在这种情况下,accumarray的第四个输入参数anon必须为任何输入返回标量.

长答案(以及有关索引排序的讨论)

考虑对索引进行排序时的输出:

>> [idxsSorted,sortInds] = sort(idxs)
>> accumarray(idxsSorted, vals0(sortInds), [], anon)
ans =
     6
     5
     7
>> accumarray(idxsSorted, vals1(sortInds), [], anon)
ans =
     6
     5
     7

现在,所有文档都必须说明以下内容:

如果未对sub中的下标进行排序,则fun不应取决于其输入数据中值的顺序.

这与anon的麻烦有何关系?这是一个线索,因为这会强制调用anon来获取给定idx的完整值集,而不是像Luis Mendo所建议的那样是子集/子数组.


考虑accumarray如何用于索引和值的未排序列表:

>> [idxs vals0 vals1]
ans =
     1     1     1
     2     4   Inf
     3     6     6
     1     3     3
     2     5     5
     3     7     7
     1     6     6
     2   Inf     4
     3     2     2

对于vals0vals1Inf属于其中idxs等于2的集合.由于idxs未排序,因此不会一次性处理idxs=2的所有值, 一开始.实际的算法(实现)是不透明的,但它似乎始于假设对idxs进行了排序,并处理了第一个参数的每个单值块.这可以通过在第四个输入参数的函数引用fun中放置一个断点来验证.当它在 second 的时间内在idxs中遇到1时,它似乎重新开始,但随后调用fun包含给定索引的所有值.大概accumarray调用unique的某些实现来完全细分idxs(顺便说一下,未保留 的顺序).正如kjo所建议的,这是accumarray 实际上按照文档中的说明处理输入的地方,紧随anon(Inf)时,它对于vals1崩溃,而对于vals0则不崩溃,而vals0则在第一次尝试时调用anon(4).

但是,即使刚好完全按照这些步骤进行操作,但如果仅包含Inf的值的完整子数组(也考虑到anon([Inf Inf Inf])也返回空矩阵,则不一定会很健壮.尽管要求很低,但fun 必须必须返回标量.文档中不清楚的是,它必须返回标量,用于任何输入,而不仅仅是返回基于算法高级描述的期望值.


解决方法:

anon = @(x) max([x(~isinf(x));-Inf]);

Short version:

The function passed as the fourth argument to accumarray sometimes gets called with arguments that are not consistent with specifications encoded the first argument to accumarray.

As a result, functions used as arguments to accumarray must test for what are, in effect, anomalous conditions.

The question is: how can an a 1-expression anonymous function test for such anomalous conditions? And more generally: how can write anonymous functions that are robust to accumarray's undocumented behavior?


Full version:

The code below is a drastically distilled version of a problem that ate up most of my workday today.

First some definitions:

idxs = [1:3 1:3 1:3]';

vals0 = [1   4 6   3 5 7   6 Inf 2]';
vals1 = [1 Inf 6   3 5 7   6   4 2]';

anon = @(x) max(x(~isinf(x)));

Note vals1 is obtained from vals0 by swapping elements 2 and 8. The "anonymous" function anon computes the maximum among the non-infinite elements of its input.

Given these definitions, the two calls below

accumarray(idxs, vals0, [], anon)
accumarray(idxs, vals1, [], anon)

which differ only in their second argument (vals0 vs vals1), should produce identical results, since the difference between vals0 and vals1 affects only the ordering of the values in the argument to one of the calls to anon, and the result of this function is insensitive to the ordering of elements in its argument.

As it turns out the first of these two expressions evaluates normally and produces the right result1:

>> accumarray(idxs, vals0, [], anon)
ans =
     6
     5
     7

The second one, however, fails with:

>> accumarray(idxs, vals1, [], anon)
Error using accumarray
The function '@(x)max(x(~isinf(x)))' returned a non-scalar value.

To troubleshoot this problem, all I could come up with2 was to write a separate function (in its own file, of course, "the MATLAB way")

function out = kluge(x)
    global ncalls;
    ncalls = ncalls + 1;
    y = ~isinf(x);
    if any(y)
        out = max(x(y));
    else
        {ncalls x}
        out = NaN;
    end
end

...and ran the following:

>> global ncalls;
>> ncalls = int8(0); accumarray(idxs, vals0, [], @kluge)
ans =
     6
     5
     7
>> ncalls = int8(0); accumarray(idxs, vals1, [], @kluge)
ans = 
    [2]    [Inf]

ans =
     6
     5
     7

As one can see from the output of the last call to accumarray above, the argument to the second call to the kluge callback was the array [Int]. This tells me beyond any doubt that accumarray is not behaving as documented3 (since idxs specifies no arrays of length 1 to be passed to accumarray's function argument).

In fact, from this and other tests I determined that, contrary to what I expected, the function passed to accumarray is called more than max(idxs) (= 3) times; in the expressions involving kluge above it's called 5 times.

The problem here is that if one cannot rely on how accumarray's function argument will actually be called, then the only way to make this function argument robust is to include in it a lot of extra code to perform the necessary checks. This almost certainly will require that the function have multiple statements, which rules out anonymous functions. (E.g. the function kluge above is robust more robust than anon, but I don't know how to fit into an anonymous function.) Not being able to use anonymous functions with accumarray greatly reduces its utility.

So my question is:

how to specify anonymous functions that can be robust arguments to accumarray?


1 I have removed blank lines from MATLAB's typical over-padding in all the MATLAB output shown in this post.
2 I welcome comments with any other troubleshooting suggestions you may have; troubleshooting this problem was a lot harder than it should be.
3 In particular, see items number 1 through 5 right after the line "The function processes the input as follows:".

解决方案

Short answer

The fourth input argument of accumarray, anon in this case, must return a scalar for any input.

Long answer (and discussion about index sorting)

Consider the output when the indexes are sorted:

>> [idxsSorted,sortInds] = sort(idxs)
>> accumarray(idxsSorted, vals0(sortInds), [], anon)
ans =
     6
     5
     7
>> accumarray(idxsSorted, vals1(sortInds), [], anon)
ans =
     6
     5
     7

Now, all the documentation has to say about this is the following:

If the subscripts in subs are not sorted, fun should not depend on the order of the values in its input data.

How does this relate the trouble with anon? It is a clue, as this forces anon to be called for the complete set of values for a given idx rather than a subset/subarray, as Luis Mendo suggested.


Consider how accumarray would work for a non-sorted list of indexes and values:

>> [idxs vals0 vals1]
ans =
     1     1     1
     2     4   Inf
     3     6     6
     1     3     3
     2     5     5
     3     7     7
     1     6     6
     2   Inf     4
     3     2     2

For both vals0 and vals1, the Inf belongs to the set where idxs equals 2. Since idxs is not sorted, it does not process all values for idxs=2 in one shot, at first. The actual algorithm (implementation) is opaque, but it seems to start by assuming that idxs is sorted, processing each single-valued block of the first argument. This is verifiable by putting a breakpoint in fun, the function reference by fourth input argument. When it encounters a 1 in idxs for the second time, it seems to start over, but with subsequent calls to fun containing all the values for a given index. Presumably accumarray calls some implementation of unique to fully-segment idxs (incidentally, order is not preserved). As kjo suggests, this is the point where accumarray actually processes the inputs as described in the documentation, following steps 1-5 here ("Find out how many unique indices there are..."). As a result, it crashes for vals1, when anon(Inf) is called, but not for vals0, which instead calls anon(4) on the first try.

However, even if it followed those steps exactly on the first go, it would not necessarily be robust if a complete subarray of values contained just Infs (consider that anon([Inf Inf Inf]) returns an empty matrix too). It is a requirement, although an understated one, that fun must return a scalar. What is not clear from the documentation is that it must return a scalar, for any inputs, not just what is expected based on the high-level description of the algorithm.


Workaround:

anon = @(x) max([x(~isinf(x));-Inf]);

这篇关于`accumarray`对其函数参数进行异常调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆