间隔中的组值 [英] group values in intervals
本文介绍了间隔中的组值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个包含零和一的熊猫系列:
I have a pandas series containing zeros and ones:
df1 = pd.Series([ 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
df1
Out[3]:
0 0
1 0
2 0
3 0
4 0
5 1
6 1
7 1
8 0
9 0
10 0
我想创建一个数据帧df2,该数据帧包含间隔的开始和结束并具有相同的值,以及与之关联的值.在这种情况下,df2应该是...
I would like to create a dataframe df2 that contains the start and the end of intervals with the same value, together with the value associated... df2 in this case should be...
df2
Out[5]:
Start End Value
0 0 4 0
1 5 7 1
2 8 10 0
我的尝试是:
from operator import itemgetter
from itertools import groupby
a=[next(group) for key, group in groupby(enumerate(df1), key=itemgetter(1))]
df2 = pd.DataFrame(a,columns=['Start','Value'])
但是我不知道如何获得'End'指标
but I don't know how to get the 'End' indeces
推荐答案
You can groupby
by Series
which is create by cumsum
of shifted Series
df1
by shift
.
然后 apply
custum函数并通过 unstack
重塑.
s = df1.ne(df1.shift()).cumsum()
df2 = df1.groupby(s).apply(lambda x: pd.Series([x.index[0], x.index[-1], x.iat[0]],
index=['Start','End','Value']))
.unstack().reset_index(drop=True)
print (df2)
Start End Value
0 0 4 0
1 5 7 1
2 8 10 0
由聚合的另一种解决方案agg
和first
和last
,但是还需要更多代码来处理所需输出的输出.
Another solution with aggregation by agg
with first
and last
, but there is necessary more code for handling output by desired output.
s = df1.ne(df1.shift()).cumsum()
d = {'first':'Start','last':'End'}
df2 = df1.reset_index(name='Value') \
.groupby([s, 'Value'])['index'] \
.agg(['first','last']) \
.reset_index(level=0, drop=True) \
.reset_index() \
.rename(columns=d) \
.reindex_axis(['Start','End','Value'], axis=1)
print (df2)
Start End Value
0 0 4 0
1 5 7 1
2 8 10 0
这篇关于间隔中的组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文