自定义 pandas groupby在间隔列表中 [英] Custom pandas groupby on a list of intervals
问题描述
我有一个数据框 df
:
AB
0 28 abc
1 29 def
2 30 hij
3 31 hij
4 32 abc
5 28 abc
6 28 abc
7 29 def
8 30 hij
9 28 abc
10 29 klm
11 30 nop
12 28 abc
13 29 xyz
df.dtypes
对象#A是一个字符串列以及
B对象
dtype:object
我想将这个列表中的值用于groupby:
i = np.array([3,5,6,9,12,14])
基本上,索引为0,1,2的 df
中的所有行都位于第一组中,索引为3,4的行位于第二组中,索引为5的行为在第三组中,等等。
我的最终目标是:
AB
28,29,30 abc,def,hij
31,32 hij,abc
28 abc
28,29,30 abc,def,hij
28,29,30 abc,klm,nop
28,29 abc,xyz
目前使用 groupby
+ pd.cut
的解决方案:
df.groupby(pd.cut(df.index,bins = np.append([0],i)), as_index = False).agg(','。join)
在
AB
0 29,30,31 def,hij,hij
1 32,28 abc,abc
2 28 abc
3 29,30,28 def,hij,abc
4 29,30,28 klm,nop,abc
5 29 xyz
$ c
结果不正确: - (
)
我该如何正确执行此操作?
解决方案您非常接近,但使用
include_lowest = True
code> right = Falsepd.cut
中,因为您希望0
th>索引,然后你不想包含每个仓的最后一个元素,即
idx = pd.cut( df.index,bins = np.append([0],i),
include_lowest = True,right = False)
df.groupby(idx,as_index = False).agg(','。join)
AB
28,29,30 abc,def,hij
31,32 hij,abc
28 abc
28,29,30 abc,def,hij
28,29,30 abc,klm,nop
28,29 abc,xyz
I have a dataframe
df
:A B 0 28 abc 1 29 def 2 30 hij 3 31 hij 4 32 abc 5 28 abc 6 28 abc 7 29 def 8 30 hij 9 28 abc 10 29 klm 11 30 nop 12 28 abc 13 29 xyz df.dtypes A object # A is a string column as well B object dtype: object
I want to use the values from this list to groupby:
i = np.array([ 3, 5, 6, 9, 12, 14])
Basically, all rows in
df
with index 0, 1, 2 are in the first group, rows with index 3, 4 are in the second group, rows with index 5 are in the third group, and so on.My end goal is this:
A B 28,29,30 abc,def,hij 31,32 hij,abc 28 abc 28,29,30 abc,def,hij 28,29,30 abc,klm,nop 28,29 abc,xyz
Solution so far using
groupby
+pd.cut
:df.groupby(pd.cut(df.index, bins=np.append([0], i)), as_index=False).agg(','.join) A B 0 29,30,31 def,hij,hij 1 32,28 abc,abc 2 28 abc 3 29,30,28 def,hij,abc 4 29,30,28 klm,nop,abc 5 29 xyz
The result is incorrect :-(
How can I do this properly?
解决方案You are very close, but use
include_lowest=True
andright=False
inpd.cut
because you want0
th index from the bins and then you don't want to include last element each of the bins i.eidx = pd.cut(df.index, bins=np.append([0], i), include_lowest=True, right=False) df.groupby(idx, as_index=False).agg(','.join)
A B 28,29,30 abc,def,hij 31,32 hij,abc 28 abc 28,29,30 abc,def,hij 28,29,30 abc,klm,nop 28,29 abc,xyz
这篇关于自定义 pandas groupby在间隔列表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!