有效地在Python列表中查找索引(与MATLAB相比) [英] Finding indices in Python lists efficiently (in comparison to MATLAB)
问题描述
我很难找到在Python列表中找到索引的有效解决方案.到目前为止,我测试过的所有解决方案都比MATLAB中的查找"功能慢.我才刚刚开始使用Python(因此,我不太有经验).
I have got difficulties to find an efficient solution to find indices in Python lists. All the solutions I have tested so far are slower than the 'find' function in MATLAB. I have only just started to use Python (therefore, I am not very experienced).
在MATLAB中,我将使用以下内容:
In MATLAB I would use the following:
a = linspace(0, 1000, 1000); % monotonically increasing vector
b = 1000 * rand(1, 100); % 100 points I want to find in a
for i = 1 : numel(b)
indices(i) = find(b(i) <= a, 1); % find the first index where b(i) <= a
end
如果我使用MATLAB的arrayfun(),则可以加快此过程的速度. 在Python中,我尝试了几种可能性.我用
If I use MATLAB's arrayfun() I can speed this process up a little bit. In Python I tried several possibilities. I used
for i in xrange(0, len(b)):
tmp = numpy.where(b[i] <= a)
indices.append(tmp[0][0])
这会花费很多时间,尤其是当a很大时. 如果b被排序,那么我可以使用
which takes a lot of time, especially if a is quite big. If b is sorted than I can use
for i in xrange(0, len(b)):
if(b[curr_idx] <= a[i]):
indices.append(i)
curr_idx += 1
if(curr_idx >= len(b)):
return indices
break
这比numpy.where()解决方案要快得多,因为我只需要一次搜索列表一次,但这仍然比MATLAB解决方案慢.
This is much quicker than the numpy.where() solution because I only have to search through the list a once, but this is still slower than the MATLAB solution.
有人可以提出更好/更有效的解决方案吗? 提前致谢.
Could anyone suggest a better / more efficient solution? Thanks in advance.
推荐答案
尝试numpy.searchsorted
:
>> a = np.array([0, 1, 2, 3, 4, 5, 6, 7])
>> b = np.array([1, 2, 4, 3, 1, 0, 2, 9])
% sorting b "into" a
>> np.searchsorted(a, b, side='right')-1
array([1, 2, 4, 3, 1, 0, 2, 9])
您可能必须对b中超出a范围的值进行一些特殊处理,例如上例中的9. 尽管如此,这应该比任何基于循环的方法都要快.
You might have to apply a little special treatment for values in b, that are outside the range of a - such as the 9 in the above example. Despite that, this should be faster than any loop-based method.
顺便说一句:
同样,MATLAB中的histc
将比循环快得多.
As an aside:
Similarly, histc
in MATLAB will be much faster than the loop.
如果要获取b
最接近a
的索引,则只需修改a即可使用相同的代码:
If you want the get the index where b
is closest to a
, you should be able to use the same code, simply with a modified a:
>> a_mod = 0.5*(a[:-1] + a[1:]) % take the centers between the elements in a
>> np.searchsorted(a_mod, np.array([0.9, 2.1, 4.2, 2.9, 1.1]), side='right')
array([1, 2, 4, 3, 1])
请注意,由于a_mod
的元素少于a
的一个元素,因此您可以删除-1
.
Note that you can drop the -1
since a_mod
has one element less than a
.
这篇关于有效地在Python列表中查找索引(与MATLAB相比)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!