如何以更有效的方式实现这个数组算法? [英] how to implement this array algorithm in a more efficient way?

查看:40
本文介绍了如何以更有效的方式实现这个数组算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有 n = 3 个相同长度的列表,例如:

R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

我需要找到第一个索引 t 来验证 n = 3 以下条件对于一个周期 p = 2.句号 p 的含义是连续盒子"的数量.

  • R1[t] >= 5,R1[t+1] >= 5.这里t +p -1 = t+1,我们只需要验证tt+1两个框.如果 p 等于 3 我们需要验证 tt+1t+2.请注意,它始终与我们测试的数字相同,我们始终测试每个索引是否大于 5.所有盒子"的条件总是相同的.
  • R2[t] >= 2,R2[t+1] >= 2
  • R3[t] >= 9,R3[t+1] >= 9

总共有 3 * p 个条件.

这里我要查找的 t5(索引从 0 开始).

执行此操作的基本方法是使用 for 循环对所有索引进行循环.如果找到某个索引 t 的条件,我们将其存储在某个局部变量 temp 中,并且我们验证条件仍然适用于索引在 t+ 之间的每个元素1t+p -1.如果在检查时,我们发现一个不满足条件的索引,我们忘记了 temp 并继续前进.

如果我有大型列表(例如 10000 个元素),在 Python 中执行此操作的最有效方法是什么?有没有比 for 循环更有效的方法?

解决方案

由于您的所有条件都相同 (>=),我们可以利用这一点.

此解决方案适用于任意数量的条件和任意大小的分析窗口,并且不使用 for 循环.

你有一个数组:

<预><代码>>>>R = np.array([R1, R2, R3]).T>>>电阻数组([[7, 8, 1],[5, 0, 7],[8, 2, 5],[6, 2, 9],[0, 0, 0],[6, 2, 9],[7, 2, 9]]

并且您有阈值:

<预><代码>>>>阈值 = [5, 2, 9]

所以你可以检查哪里满足条件:

<预><代码>>>>R>=阈值数组([[真,真,假],[对,错,错],[对,对,错],[真,真,真],[假,假,假],[真,真,真],[真,真,真]])

他们在同一时间相遇的地方:

<预><代码>>>>R_cond = np.all(R >= 阈值,轴=1)>>>R_cond数组([假,假,假,真,假,真,真])

从那里开始,您希望满足给定窗口的条件.

我们将使用布尔值可以相加的事实,并使用卷积来应用窗口:

<预><代码>>>>胜利大小 = 2>>>R_conv = np.convolve(R_cond, np.ones(win_size), mode=valid")>>>R_conv数组([0., 0., 1., 1., 1., 2.])

结果数组的值将等于 win_size 在窗口范围内满足所有条件的索引处.

那么让我们检索这些索引中的第一个:

<预><代码>>>>索引 = np.where(R_conv == win_size)[0][0]>>>指数5

如果这样的索引不存在,它会引发一个IndexError,我让你来处理.

所以,作为一个单行函数,它给出:

def idx_conditions(arr, thresholds, win_size, condition):返回 np.where(np.convolve(np.all(条件(arr,阈值),轴= 1),np.ones(win_size),模式=有效")== 胜利大小)[0][0]

为了更通用,我添加了条件作为函数的参数.

<预><代码>>>>从运营商进口ge>>>idx_conditions(R,阈值,win_size,ge)5

Assuming I have n = 3 lists of same length for example:

R1 = [7,5,8,6,0,6,7]

R2 = [8,0,2,2,0,2,2]

R3 = [1,7,5,9,0,9,9]

I need to find the first index t that verifies the n = 3 following conditions for a period p = 2. Edit: the meaning of period p is the number of consecutive "boxes".

  • R1[t] >= 5, R1[t+1] >= 5. Here t +p -1 = t+1, we need to only verify for two boxes t and t+1. If p was equal to 3 we will need to verify for t, t+1 and t+2. Note that It's always the same number for which we test, we always test if it's greater than 5 for every index. The condition is always the same for all the "boxes".
  • R2[t] >= 2, R2[t+1] >= 2
  • R3[t] >= 9, R3[t+1] >= 9

In total there is 3 * p conditions.

Here the t I am looking for is 5 (indexing is starting from 0).

The basic way to do this is by looping on all the indexes using a for loop. If the condition is found for some index t we store it in some local variable temp and we verify the conditions still hold for every element whose index is between t+1 and t+p -1. If while checking, we find an index that does not satisfy a condition, we forget about the temp and we keep going.

What is the most efficient way to do this in Python if I have large lists (like of 10000 elements)? Is there a more efficient way than the for loop?

解决方案

Since all your conditions are the same (>=), we could leverage this.

This solution will work for any number of conditions and any size of analysis window, and no for loop is used.

You have an array:

>>> R = np.array([R1, R2, R3]).T                                                                                                                                                                         
>>> R
array([[7, 8, 1],
       [5, 0, 7],
       [8, 2, 5],
       [6, 2, 9],
       [0, 0, 0],
       [6, 2, 9],
       [7, 2, 9]]

and you have thresholds:

>>> thresholds = [5, 2, 9]

So you can check where the conditions are met:

>>> R >= thresholds
array([[ True,  True, False],
       [ True, False, False],
       [ True,  True, False],
       [ True,  True,  True],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True]])

And where they all met at the same time:

>>> R_cond = np.all(R >= thresholds, axis=1)
>>> R_cond
array([False, False, False,  True, False,  True,  True])

From there, you want the conditions to be met for a given window.

We'll use the fact that booleans can sum together, and convolution to apply the window:

>>> win_size = 2
>>> R_conv = np.convolve(R_cond, np.ones(win_size), mode="valid")
>>> R_conv
array([0., 0., 1., 1., 1., 2.])

The resulting array will have values equal to win_size at the indices where all conditions are met on the window range.

So let's retrieve the first of those indices:

>>> index = np.where(R_conv == win_size)[0][0]
>>> index
5

If such an index doesn't exist, it will raise an IndexError, I'm letting you handle that.

So, as a one-liner function, it gives:

def idx_conditions(arr, thresholds, win_size, condition):
    return np.where(
        np.convolve(
            np.all(condition(arr, thresholds), axis=1),
            np.ones(win_size),
            mode="valid"
        )
        == win_size
    )[0][0]

I added the condition as an argument to the function, to be more general.

>>> from operator import ge
>>> idx_conditions(R, thresholds, win_size, ge)
5

这篇关于如何以更有效的方式实现这个数组算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆