Pandas:计算第一个连续的 True 值 [英] Pandas: Count the first consecutive True values
问题描述
我正在尝试实现一个函数来识别 Pandas Series
中的 第一次连续 次出现,该函数已经被我想要的条件屏蔽了:(例如)
I am trying to implement a function that identifies the first consecutive occurrences in a Pandas Series
, which has already been masked with the condition I wanted: (e.g.)
[真、真、真、假、真、假、真、真、真、真]
我希望上面的输入给出3
的结果,即从系列的开头开始连续出现3次True
.
I want the above input to give the result of 3
, i.e., there are 3 True
occurrences in a row from the beginning of the series.
我知道一个大的 for
循环可以完成这项工作,但是有没有任何矢量化/以 Pandas 为中心的方法来解决它?
I am aware that a big for
loop would do the work, but are there any vectorized / Pandas-centric way to go around it?
非常感谢.
推荐答案
问题:
求第一个连续的True
s
考虑 a
a = np.array([True, True, True, False, True, False, True, True, True, True])
答案 1numpy
:在 a
的否定上使用 np.logical_and.accumulate
并取它的否定来制作一个消除第一系列的掩码False
s 如果它们应该存在.然后在末尾附加一个 False
以确保我们有一个非 True
分钟.最后,使用 np.argmin
定位第一个最小值.如果找到位置 3
,则表示 3
True
在它之前的值.
Answer 1
numpy
: Use np.logical_and.accumulate
on the negation of a
and take the negation of that to make a mask that eliminates the first series of False
s if they should exist. Then append a False
at the end to ensure we have a non True
min. Finally, use np.argmin
to locate the first minimum value. If it's found a position 3
, that will indicate 3
True
values before it.
np.argmin(np.append(a[~np.logical_and.accumulate(~a)], False))
3
<小时>
答案 2numba.njit
我想使用 numba
以便我可以循环并确保在我们想要/需要时短路.这是一个肯定会在数组的早期得到回答的问题.没有必要无缘无故地评估整个数组的东西.
I'd like to use numba
so I can loop and make sure I get to short circuit when we want/need to. This is a problem that is sure to be answered early in the array. There isn't need to evaluate things along the entire array for no reason.
from numba import njit
@njit
def first_true(a):
true_started = False
c = 0
for i, j in enumerate(a):
if true_started and not j:
return c
else:
c += j
true_started = true_started or j
return c
first_true(a)
3
<小时>
答案 3numpy
更智能地使用 argmin
和 argmax
.我用 False
包围 a
然后使用 argmax
找到第一个 True
然后从那时起,使用 argmin
找到之后的第一个 False
.
注意:@Divakar 对此答案进行了改进,消除了使用 np.concatenate
并使用 if/then/else
代替.这将这个已经非常快的解决方案减少了 3
倍!
Answer 3
numpy
smarter use of argmin
and argmax
. I surround a
with False
then use argmax
to find the first True
then from that point on, use argmin
to find the first False
after that.
Note: @Divakar made an improvement on this answer that eliminates the use of np.concatenate
and uses if/then/else
instead. That cut this already very fast solution by a factor of 3
!
def first_true2(a):
a = np.concatenate([[False], a, [False]])
return np.argmin(a[np.argmax(a):])
first_true2(a)
3
<小时>
这些答案的速度有多快?
请参阅@Divakar 的回答以了解其他正在计时的函数的源代码
%timeit first_true(a)
%timeit np.argmin(np.append(a[~np.logical_and.accumulate(~a)], False))
%timeit np.diff(np.flatnonzero(np.diff(np.r_[0,a,0])))[0]
%timeit first_True_island_len(a)
%timeit first_true2(a)
%timeit first_True_island_len_IFELSE(a)
a = np.array([True, True, True, False, True, False, True, True, True, True])
1000000 loops, best of 3: 353 ns per loop
100000 loops, best of 3: 8.32 µs per loop
10000 loops, best of 3: 27.4 µs per loop
100000 loops, best of 3: 5.48 µs per loop
100000 loops, best of 3: 5.38 µs per loop
1000000 loops, best of 3: 1.35 µs per loop
a = np.array([False] * 100000 + [True] * 10000)
10000 loops, best of 3: 112 µs per loop
10000 loops, best of 3: 127 µs per loop
1000 loops, best of 3: 513 µs per loop
10000 loops, best of 3: 110 µs per loop
100000 loops, best of 3: 13.9 µs per loop
100000 loops, best of 3: 4.55 µs per loop
a = np.array([False] * 100000 + [True])
10000 loops, best of 3: 102 µs per loop
10000 loops, best of 3: 115 µs per loop
1000 loops, best of 3: 472 µs per loop
10000 loops, best of 3: 108 µs per loop
100000 loops, best of 3: 14 µs per loop
100000 loops, best of 3: 4.45 µs per loop
这篇关于Pandas:计算第一个连续的 True 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!