pandas 从分组的数据框中计算连续相等值的长度 [英] Pandas calculate length of consecutive equal values from a grouped dataframe
问题描述
I want to do what they've done in the answer here: Calculating the number of specific consecutive equal values in a vectorized way in pandas , but using a grouped dataframe instead of a series.
给定一个具有几列的数据框
So given a dataframe with several columns
A B C
------------
x x 0
x x 5
x x 2
x x 0
x x 0
x x 3
x x 0
y x 1
y x 10
y x 0
y x 5
y x 0
y x 0
我想对A和B列进行分组,然后计算C中连续零的数目.之后,我想返回每个零长度出现的次数的计数.所以我想要这样的输出:
I want to groupby columns A and B, then count the number of consecutive zeros in C. After that I'd like to return counts of the number of times each length of zeros occurred. So I want output like this:
A B num_consecutive_zeros count
---------------------------------------
x x 1 2
x x 2 1
y x 1 1
y x 2 1
我不知道如何调整链接问题的答案以处理分组数据框.
I don't know how to adapt the answer from the linked question to deal with grouped dataframes.
推荐答案
下面是代码, count_consecutive_zeros()
使用numpy函数和 pandas.value_counts()
来获取结果,然后使用 groupby().apply(count_consecutive_zeros)
为每个组调用 count_consecutive_zeros()
.调用 reset_index()
将 MultiIndex
更改为列:
Here is the code, count_consecutive_zeros()
use numpy functions and pandas.value_counts()
to get the results, and use groupby().apply(count_consecutive_zeros)
to call count_consecutive_zeros()
for every group. call reset_index()
to change MultiIndex
to columns:
import pandas as pd
import numpy as np
from io import BytesIO
text = """A B C
x x 0
x x 5
x x 2
x x 0
x x 0
x x 3
x x 0
y x 1
y x 10
y x 0
y x 5
y x 0
y x 0"""
df = pd.read_csv(BytesIO(text.encode()), delim_whitespace=True)
def count_consecutive_zeros(s):
v = np.diff(np.r_[0, s.values==0, 0])
s = pd.value_counts(np.where(v == -1)[0] - np.where(v == 1)[0])
s.index.name = "num_consecutive_zeros"
s.name = "count"
return s
df.groupby(["A", "B"]).C.apply(count_consecutive_zeros).reset_index()
这篇关于 pandas 从分组的数据框中计算连续相等值的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!