Pandas:计算第一个连续的 True 值 [英] Pandas: Count the first consecutive True values

查看:73
本文介绍了Pandas:计算第一个连续的 True 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现一个函数来识别 Pandas Series 中的 第一次连续 次出现,该函数已经被我想要的条件屏蔽了:(例如)

I am trying to implement a function that identifies the first consecutive occurrences in a Pandas Series, which has already been masked with the condition I wanted: (e.g.)

[真、真、真、假、真、假、真、真、真、真]

我希望上面的输入给出3的结果,即从系列的开头开始连续出现3次True.

I want the above input to give the result of 3, i.e., there are 3 True occurrences in a row from the beginning of the series.

我知道一个大的 for 循环可以完成这项工作,但是有没有任何矢量化/以 Pandas 为中心的方法来解决它?

I am aware that a big for loop would do the work, but are there any vectorized / Pandas-centric way to go around it?

非常感谢.

推荐答案

问题:

求第一个连续的Trues
考虑 a

a = np.array([True, True, True, False, True, False, True, True, True, True])  

答案 1
numpy:在 a 的否定上使用 np.logical_and.accumulate 并取它的否定来制作一个消除第一系列的掩码Falses 如果它们应该存在.然后在末尾附加一个 False 以确保我们有一个非 True 分钟.最后,使用 np.argmin 定位第一个最小值.如果找到位置 3,则表示 3 True 在它之前的值.

Answer 1
numpy: Use np.logical_and.accumulate on the negation of a and take the negation of that to make a mask that eliminates the first series of Falses if they should exist. Then append a False at the end to ensure we have a non True min. Finally, use np.argmin to locate the first minimum value. If it's found a position 3, that will indicate 3 True values before it.

np.argmin(np.append(a[~np.logical_and.accumulate(~a)], False))

3

<小时>

答案 2
numba.njit

我想使用 numba 以便我可以循环并确保在我们想要/需要时短路.这是一个肯定会在数组的早期得到回答的问题.没有必要无缘无故地评估整个数组的东西.

I'd like to use numba so I can loop and make sure I get to short circuit when we want/need to. This is a problem that is sure to be answered early in the array. There isn't need to evaluate things along the entire array for no reason.

from numba import njit

@njit
def first_true(a):
    true_started = False
    c = 0
    for i, j in enumerate(a):
        if true_started and not j:
            return c
        else:
            c += j
            true_started = true_started or j
    return c

first_true(a)

3

<小时>

答案 3
numpy 更智能地使用 argminargmax.我用 False 包围 a 然后使用 argmax 找到第一个 True 然后从那时起,使用 argmin 找到之后的第一个 False.
注意:@Divakar 对此答案进行了改进,消除了使用 np.concatenate 并使用 if/then/else 代替.这将这个已经非常快的解决方案减少了 3 倍!


Answer 3
numpy smarter use of argmin and argmax. I surround a with False then use argmax to find the first True then from that point on, use argmin to find the first False after that.
Note: @Divakar made an improvement on this answer that eliminates the use of np.concatenate and uses if/then/else instead. That cut this already very fast solution by a factor of 3!

def first_true2(a):
    a = np.concatenate([[False], a, [False]])
    return np.argmin(a[np.argmax(a):])

first_true2(a)

3

<小时>

这些答案的速度有多快?
请参阅@Divakar 的回答以了解其他正在计时的函数的源代码

%timeit first_true(a)
%timeit np.argmin(np.append(a[~np.logical_and.accumulate(~a)], False))
%timeit np.diff(np.flatnonzero(np.diff(np.r_[0,a,0])))[0]
%timeit first_True_island_len(a)
%timeit first_true2(a)
%timeit first_True_island_len_IFELSE(a)


a = np.array([True, True, True, False, True, False, True, True, True, True])    
1000000 loops, best of 3: 353 ns per loop
100000 loops, best of 3: 8.32 µs per loop
10000 loops, best of 3: 27.4 µs per loop
100000 loops, best of 3: 5.48 µs per loop
100000 loops, best of 3: 5.38 µs per loop
1000000 loops, best of 3: 1.35 µs per loop

a = np.array([False] * 100000 + [True] * 10000)
10000 loops, best of 3: 112 µs per loop
10000 loops, best of 3: 127 µs per loop
1000 loops, best of 3: 513 µs per loop
10000 loops, best of 3: 110 µs per loop
100000 loops, best of 3: 13.9 µs per loop
100000 loops, best of 3: 4.55 µs per loop

a = np.array([False] * 100000 + [True])
10000 loops, best of 3: 102 µs per loop
10000 loops, best of 3: 115 µs per loop
1000 loops, best of 3: 472 µs per loop
10000 loops, best of 3: 108 µs per loop
100000 loops, best of 3: 14 µs per loop
100000 loops, best of 3: 4.45 µs per loop

这篇关于Pandas:计算第一个连续的 True 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆