有效地在Python列表中查找索引(与MATLAB相比) [英] Finding indices in Python lists efficiently (in comparison to MATLAB)

查看:91
本文介绍了有效地在Python列表中查找索引(与MATLAB相比)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难找到在Python列表中找到索引的有效解决方案.到目前为止,我测试过的所有解决方案都比MATLAB中的查找"功能慢.我才刚刚开始使用Python(因此,我不太有经验).

I have got difficulties to find an efficient solution to find indices in Python lists. All the solutions I have tested so far are slower than the 'find' function in MATLAB. I have only just started to use Python (therefore, I am not very experienced).

在MATLAB中,我将使用以下内容:

In MATLAB I would use the following:

a = linspace(0, 1000, 1000); % monotonically increasing vector
b = 1000 * rand(1, 100); % 100 points I want to find in a
for i = 1 : numel(b)
    indices(i) = find(b(i) <= a, 1); % find the first index where b(i) <= a
end

如果我使用MATLAB的arrayfun(),则可以加快此过程的速度. 在Python中,我尝试了几种可能性.我用

If I use MATLAB's arrayfun() I can speed this process up a little bit. In Python I tried several possibilities. I used

for i in xrange(0, len(b)):
   tmp = numpy.where(b[i] <= a)
   indices.append(tmp[0][0])

这会花费很多时间,尤其是当a很大时. 如果b被排序,那么我可以使用

which takes a lot of time, especially if a is quite big. If b is sorted than I can use

for i in xrange(0, len(b)):
    if(b[curr_idx] <= a[i]):
        indices.append(i)
        curr_idx += 1
    if(curr_idx >= len(b)):
        return indices
        break

这比numpy.where()解决方案要快得多,因为我只需要一次搜索列表一次,但这仍然比MATLAB解决方案慢.

This is much quicker than the numpy.where() solution because I only have to search through the list a once, but this is still slower than the MATLAB solution.

有人可以提出更好/更有效的解决方案吗? 提前致谢.

Could anyone suggest a better / more efficient solution? Thanks in advance.

推荐答案

尝试numpy.searchsorted:

>> a = np.array([0, 1, 2, 3, 4, 5, 6, 7])
>> b = np.array([1, 2, 4, 3, 1, 0, 2, 9])
% sorting b "into" a
>> np.searchsorted(a, b, side='right')-1
array([1, 2, 4, 3, 1, 0, 2, 9])

您可能必须对b中超出a范围的值进行一些特殊处理,例如上例中的9. 尽管如此,这应该比任何基于循环的方法都要快.

You might have to apply a little special treatment for values in b, that are outside the range of a - such as the 9 in the above example. Despite that, this should be faster than any loop-based method.

顺便说一句: 同样,MATLAB中的histc将比循环快得多.

As an aside: Similarly, histc in MATLAB will be much faster than the loop.

如果要获取b最接近a的索引,则只需修改a即可使用相同的代码:

If you want the get the index where b is closest to a, you should be able to use the same code, simply with a modified a:

>> a_mod = 0.5*(a[:-1] + a[1:]) % take the centers between the elements in a
>> np.searchsorted(a_mod, np.array([0.9, 2.1, 4.2, 2.9, 1.1]), side='right')
array([1, 2, 4, 3, 1])

请注意,由于a_mod的元素少于a的一个元素,因此您可以删除-1.

Note that you can drop the -1 since a_mod has one element less than a.

这篇关于有效地在Python列表中查找索引(与MATLAB相比)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆