自定义 pandas groupby在间隔列表中 [英] Custom pandas groupby on a list of intervals

查看：180 发布时间：2018/5/30 14:14:32 python pandas group-by pandas-groupby

本文介绍了自定义 pandas groupby在间隔列表中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框 df ：

  AB 
 0 28 abc 
 1 29 def 
 2 30 hij 
 3 31 hij 
 4 32 abc 
 5 28 abc 
 6 28 abc 
 7 29 def 
 8 30 hij 
 9 28 abc 
 10 29 klm 
 11 30 nop 
 12 28 abc 
 13 29 xyz 
 
 df.dtypes 
 
对象＃A是一个字符串列以及
 B对象
 dtype：object

我想将这个列表中的值用于groupby：

  i = np.array（[3，5，6，9，12，14]）

基本上，索引为0,1,2的 df 中的所有行都位于第一组中，索引为3,4的行位于第二组中，索引为5的行为在第三组中，等等。

我的最终目标是：

AB 28,29,30 abc，def，hij 31,32 hij，abc 28 abc 28,29,30 abc，def，hij 28,29,30 abc，klm，nop 28,29 abc，xyz

目前使用 groupby + pd.cut 的解决方案：
df.groupby（pd.cut（df.index，bins = np.append（[0]，i））， as_index = False）.agg（'，'。join） AB 0 29,30,31 def，hij，hij 1 32,28 abc，abc 2 28 abc 3 29,30,28 def，hij，abc 4 29,30,28 klm，nop，abc 5 29 xyz 结果不正确： - （）我该如何正确执行此操作？解决方案
您非常接近，但使用 include_lowest = True code> right = False 在 pd.cut 中，因为您希望 0 th>索引，然后你不想包含每个仓的最后一个元素，即

idx = pd.cut（ df.index，bins = np.append（[0]，i）， include_lowest = True，right = False） df.groupby（idx，as_index = False）.agg（'，'。join）

AB
28,29,30 abc，def，hij
31,32 hij，abc
28 abc
28,29,30 abc，def，hij
28,29,30 abc，klm，nop
28,29 abc，xyz

I have a dataframe df:
A B 0 28 abc 1 29 def 2 30 hij 3 31 hij 4 32 abc 5 28 abc 6 28 abc 7 29 def 8 30 hij 9 28 abc 10 29 klm 11 30 nop 12 28 abc 13 29 xyz df.dtypes A object # A is a string column as well B object dtype: object
I want to use the values from this list to groupby:
i = np.array([ 3, 5, 6, 9, 12, 14])
Basically, all rows in df with index 0, 1, 2 are in the first group, rows with index 3, 4 are in the second group, rows with index 5 are in the third group, and so on.

My end goal is this:
A B 28,29,30 abc,def,hij 31,32 hij,abc 28 abc 28,29,30 abc,def,hij 28,29,30 abc,klm,nop 28,29 abc,xyz

Solution so far using groupby + pd.cut:
df.groupby(pd.cut(df.index, bins=np.append([0], i)), as_index=False).agg(','.join) A B 0 29,30,31 def,hij,hij 1 32,28 abc,abc 2 28 abc 3 29,30,28 def,hij,abc 4 29,30,28 klm,nop,abc 5 29 xyz
The result is incorrect :-(

How can I do this properly?
解决方案
You are very close, but use include_lowest=True and right=False in pd.cut because you want 0th index from the bins and then you don't want to include last element each of the bins i.e
idx = pd.cut(df.index, bins=np.append([0], i), include_lowest=True, right=False) df.groupby(idx, as_index=False).agg(','.join)

A B 28,29,30 abc,def,hij 31,32 hij,abc 28 abc 28,29,30 abc,def,hij 28,29,30 abc,klm,nop 28,29 abc,xyz

这篇关于自定义 pandas groupby在间隔列表中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

自定义 pandas groupby在间隔列表中 [英] Custom pandas groupby on a list of intervals

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

自定义 pandas groupby在间隔列表中 [英] Custom pandas groupby on a list of intervals

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭