间隔中的组值 [英] group values in intervals

查看：79 发布时间：2020/5/24 3:43:03 python pandas dataframe intervals pandas-groupby

本文介绍了间隔中的组值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含零和一的熊猫系列:

I have a pandas series containing zeros and ones:

df1 = pd.Series([ 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
df1
Out[3]: 
0         0
1         0
2         0
3         0
4         0
5         1
6         1
7         1
8         0
9         0
10        0

我想创建一个数据帧df2，该数据帧包含间隔的开始和结束并具有相同的值，以及与之关联的值.在这种情况下，df2应该是...

I would like to create a dataframe df2 that contains the start and the end of intervals with the same value, together with the value associated... df2 in this case should be...

df2
Out[5]: 
   Start     End  Value
0      0  4         0
1      5  7         1
2      8  10        0

我的尝试是:

from operator import itemgetter
from itertools import groupby

a=[next(group) for key, group in groupby(enumerate(df1), key=itemgetter(1))]   
df2 = pd.DataFrame(a,columns=['Start','Value'])

但是我不知道如何获得'End'指标

but I don't know how to get the 'End' indeces

推荐答案

您可以 groupby 由

You can groupby by Series which is create by cumsum of shifted Series df1 by shift.

然后 apply custum函数并通过 unstack重塑.

s = df1.ne(df1.shift()).cumsum()
df2 = df1.groupby(s).apply(lambda x: pd.Series([x.index[0], x.index[-1], x.iat[0]], 
                                                index=['Start','End','Value']))
                   .unstack().reset_index(drop=True)
print (df2)
   Start  End  Value
0      0    4      0
1      5    7      1
2      8   10      0

由聚合的另一种解决方案agg 和first和last，但是还需要更多代码来处理所需输出的输出.

Another solution with aggregation by agg with first and last, but there is necessary more code for handling output by desired output.

s = df1.ne(df1.shift()).cumsum()
d = {'first':'Start','last':'End'}
df2 = df1.reset_index(name='Value') \
         .groupby([s, 'Value'])['index'] \
         .agg(['first','last'])  \
         .reset_index(level=0, drop=True) \
         .reset_index() \
         .rename(columns=d) \
         .reindex_axis(['Start','End','Value'], axis=1)
print (df2)
   Start  End  Value
0      0    4      0
1      5    7      1
2      8   10      0

这篇关于间隔中的组值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

间隔中的组值 [英] group values in intervals

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

间隔中的组值 [英] group values in intervals

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭