自更改以来有效的 pandas /numpy功能 [英] Efficient pandas/numpy function for time since change

查看：96 发布时间：2020/5/18 20:19:53 python-3.x pandas numpy series

本文介绍了自更改以来有效的 pandas /numpy功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出一个Series，我想有效地计算自发生更改以来已通过了多少观察.这是一个简单的示例:

Given a Series , I would like to efficiently compute how many observations have passed since there was a change. Here is a simple example:

ser = pd.Series([1.2,1.2,1.2,1.2,2,2,2,4,3])

print(ser)

0    1.2
1    1.2
2    1.2
3    1.2
4    2.0
5    2.0
6    2.0
7    4.0
8    3.0

我想对ser应用一个函数，该函数将导致:

I would like to apply a function to ser which would result in:

在处理大型系列作品时，我希望使用不涉及循环的快速解决方案.谢谢

As I am dealing with large series I would prefer a fast solution that does not involve looping. Thanks

编辑如果可能的话，希望该函数也可用于具有相同值的序列(这只会导致一系列整数加1)

Edit If possible, would like the function to work also for series with identical values (which would just result in a series of integers incremented by 1)

推荐答案

这是一种NumPy方法-

Here's one NumPy approach -

def array_cumcount(a):
    idx = np.flatnonzero(a[1:] != a[:-1])+1
    shift_arr = np.ones(a.size,dtype=int)
    shift_arr[0] = 0

    if len(idx)>=1:
        shift_arr[idx[0]] = -idx[0]+1
        shift_arr[idx[1:]] = -idx[1:] + idx[:-1] + 1
    return shift_arr.cumsum()

样品运行-

In [583]: ser = pd.Series([1.2,1.2,1.2,1.2,2,2,2,4,3,3,3,3])

In [584]: array_cumcount(ser.values)
Out[584]: array([0, 1, 2, 3, 0, 1, 2, 0, 0, 1, 2, 3])

运行时测试-

In [601]: ser = pd.Series(np.random.randint(0,3,(10000)))

# @Psidom's soln
In [602]: %timeit ser.groupby(ser).cumcount()
1000 loops, best of 3: 729 µs per loop

In [603]: %timeit array_cumcount(ser.values)
10000 loops, best of 3: 85.3 µs per loop

In [604]: ser = pd.Series(np.random.randint(0,3,(1000000)))

# @Psidom's soln
In [605]: %timeit ser.groupby(ser).cumcount()
10 loops, best of 3: 30.1 ms per loop

In [606]: %timeit array_cumcount(ser.values)
100 loops, best of 3: 11.7 ms per loop

这篇关于自更改以来有效的 pandas /numpy功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

自更改以来有效的 pandas /numpy功能 [英] Efficient pandas/numpy function for time since change

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

自更改以来有效的 pandas /numpy功能 [英] Efficient pandas/numpy function for time since change

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭