如何以更有效的方式实现这个数组算法? [英] how to implement this array algorithm in a more efficient way?
问题描述
假设我有 n = 3
个相同长度的列表,例如:
R1 = [7,5,8,6,0,6,7]
R2 = [8,0,2,2,0,2,2]
R3 = [1,7,5,9,0,9,9]
我需要找到第一个索引 t
来验证 n = 3 以下条件对于一个周期 p = 2
.句号 p
的含义是连续盒子"的数量.
R1[t] >= 5,R1[t+1] >= 5
.这里t +p -1 = t+1
,我们只需要验证t
和t+1
两个框.如果p
等于3
我们需要验证t
、t+1
和t+2
.请注意,它始终与我们测试的数字相同,我们始终测试每个索引是否大于5
.所有盒子"的条件总是相同的.R2[t] >= 2,R2[t+1] >= 2
R3[t] >= 9,R3[t+1] >= 9
总共有 3 * p 个条件.
这里我要查找的 t
是 5
(索引从 0 开始).
执行此操作的基本方法是使用 for
循环对所有索引进行循环.如果找到某个索引 t
的条件,我们将其存储在某个局部变量 temp
中,并且我们验证条件仍然适用于索引在 t+ 之间的每个元素1
和 t+p -1
.如果在检查时,我们发现一个不满足条件的索引,我们忘记了 temp
并继续前进.
如果我有大型列表(例如 10000 个元素),在 Python 中执行此操作的最有效方法是什么?有没有比 for 循环更有效的方法?
由于您的所有条件都相同 (>=
),我们可以利用这一点.
此解决方案适用于任意数量的条件和任意大小的分析窗口,并且不使用 for 循环.
你有一个数组:
<预><代码>>>>R = np.array([R1, R2, R3]).T>>>电阻数组([[7, 8, 1],[5, 0, 7],[8, 2, 5],[6, 2, 9],[0, 0, 0],[6, 2, 9],[7, 2, 9]]并且您有阈值:
<预><代码>>>>阈值 = [5, 2, 9]所以你可以检查哪里满足条件:
<预><代码>>>>R>=阈值数组([[真,真,假],[对,错,错],[对,对,错],[真,真,真],[假,假,假],[真,真,真],[真,真,真]])他们在同一时间相遇的地方:
<预><代码>>>>R_cond = np.all(R >= 阈值,轴=1)>>>R_cond数组([假,假,假,真,假,真,真])从那里开始,您希望满足给定窗口的条件.
我们将使用布尔值可以相加的事实,并使用卷积来应用窗口:
<预><代码>>>>胜利大小 = 2>>>R_conv = np.convolve(R_cond, np.ones(win_size), mode=valid")>>>R_conv数组([0., 0., 1., 1., 1., 2.])结果数组的值将等于 win_size
在窗口范围内满足所有条件的索引处.
那么让我们检索这些索引中的第一个:
<预><代码>>>>索引 = np.where(R_conv == win_size)[0][0]>>>指数5如果这样的索引不存在,它会引发一个IndexError
,我让你来处理.
所以,作为一个单行函数,它给出:
def idx_conditions(arr, thresholds, win_size, condition):返回 np.where(np.convolve(np.all(条件(arr,阈值),轴= 1),np.ones(win_size),模式=有效")== 胜利大小)[0][0]
为了更通用,我添加了条件作为函数的参数.
<预><代码>>>>从运营商进口ge>>>idx_conditions(R,阈值,win_size,ge)5Assuming I have n = 3
lists of same length for example:
R1 = [7,5,8,6,0,6,7]
R2 = [8,0,2,2,0,2,2]
R3 = [1,7,5,9,0,9,9]
I need to find the first index t
that verifies the n = 3 following conditions for a period p = 2
.
Edit: the meaning of period p
is the number of consecutive "boxes".
R1[t] >= 5, R1[t+1] >= 5
. Heret +p -1 = t+1
, we need to only verify for two boxest
andt+1
. Ifp
was equal to3
we will need to verify fort
,t+1
andt+2
. Note that It's always the same number for which we test, we always test if it's greater than5
for every index. The condition is always the same for all the "boxes".R2[t] >= 2, R2[t+1] >= 2
R3[t] >= 9, R3[t+1] >= 9
In total there is 3 * p conditions.
Here the t
I am looking for is 5
(indexing is starting from 0).
The basic way to do this is by looping on all the indexes using a for
loop. If the condition is found for some index t
we store it in some local variable temp
and we verify the conditions still hold for every element whose index is between t+1
and t+p -1
. If while checking, we find an index that does not satisfy a condition, we forget about the temp
and we keep going.
What is the most efficient way to do this in Python if I have large lists (like of 10000 elements)? Is there a more efficient way than the for loop?
Since all your conditions are the same (>=
), we could leverage this.
This solution will work for any number of conditions and any size of analysis window, and no for loop is used.
You have an array:
>>> R = np.array([R1, R2, R3]).T
>>> R
array([[7, 8, 1],
[5, 0, 7],
[8, 2, 5],
[6, 2, 9],
[0, 0, 0],
[6, 2, 9],
[7, 2, 9]]
and you have thresholds:
>>> thresholds = [5, 2, 9]
So you can check where the conditions are met:
>>> R >= thresholds
array([[ True, True, False],
[ True, False, False],
[ True, True, False],
[ True, True, True],
[False, False, False],
[ True, True, True],
[ True, True, True]])
And where they all met at the same time:
>>> R_cond = np.all(R >= thresholds, axis=1)
>>> R_cond
array([False, False, False, True, False, True, True])
From there, you want the conditions to be met for a given window.
We'll use the fact that booleans can sum together, and convolution to apply the window:
>>> win_size = 2
>>> R_conv = np.convolve(R_cond, np.ones(win_size), mode="valid")
>>> R_conv
array([0., 0., 1., 1., 1., 2.])
The resulting array will have values equal to win_size
at the indices where all conditions are met on the window range.
So let's retrieve the first of those indices:
>>> index = np.where(R_conv == win_size)[0][0]
>>> index
5
If such an index doesn't exist, it will raise an IndexError
, I'm letting you handle that.
So, as a one-liner function, it gives:
def idx_conditions(arr, thresholds, win_size, condition):
return np.where(
np.convolve(
np.all(condition(arr, thresholds), axis=1),
np.ones(win_size),
mode="valid"
)
== win_size
)[0][0]
I added the condition as an argument to the function, to be more general.
>>> from operator import ge
>>> idx_conditions(R, thresholds, win_size, ge)
5
这篇关于如何以更有效的方式实现这个数组算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!