Pandas 按唯一计数分组为新列 [英] Pandas group by unique counts as new column

查看:70
本文介绍了Pandas 按唯一计数分组为新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的 Pandas 数据框中添加一个新列 col,它将被计算为:

I want to add a new column col in my pandas data frame which will be calculated as:

select count(distinct ITEM) as col
from base_data
where STOCK > 0 
group by DEPT, CLAS, DATE;

我正在做的事情

assort_size = base_data[(base_data['STOCK'] > 0)]\
.groupby(['DEPT','CLAS','DATE'])['ITEM']\
.transform('nunique')

基本上对于每个部门、班级、日期组合,我想获取库存中存在的项目数.因此,我想将此与父数据框合并的结果,但结果显示为 pandas.core.series.Series 所以我不能 append (axis=1)它返回(行数不同,例如 1.6 M 与 1.4 M).此外,我没有要加入的 DEPT、CLAS、DATE 列.我可以在这里做什么来获取按列分组的数据框?

Basically for each dept, class, date combination I want to get number of items which are present in stock. So I then want to result of this merge with parent data frame but result is coming out as pandas.core.series.Series so I can not append (axis=1) it back (row count differs e.g. 1.6 M Vs 1.4 M). Also I don't have DEPT, CLAS, DATE columns to join. What can I do here to get dataframe with group by columns?

有没有比创建一个新对象更好的方法来直接在父 Pandas 数据框 (base_data) 中创建新列,就像我创建 assort_size 一样?

Is there any better way to create new column directly in parent pandas dataframe (base_data)than creating a new object like I am creating assort_size?

推荐答案

您可以使用 首先布尔索引,然后groupbynunique 和最后一个 join:

You can use boolean indexing first, then groupby with nunique and last join:

base_data = pd.DataFrame({"DEPT": ["a", "a", "b", "b"],
                   "CLAS":['d','d','d','d'],
                   "STOCK": [-1, 1, 2,2],
                   "DATE":pd.to_datetime(['2001-10-10','2001-10-10',
                                          '2001-10-10','2001-10-10']),
                   "ITEM":[1,2,3,4]})

print (base_data)
  CLAS       DATE DEPT  ITEM  STOCK
0    d 2001-10-10    a     1     -1
1    d 2001-10-10    a     2      1
2    d 2001-10-10    b     3      2
3    d 2001-10-10    b     4      2

assort_size = base_data[(base_data['STOCK'] > 0)]\
.groupby(['DEPT','CLAS','DATE'])['ITEM'].nunique().rename('n_item')
print (assort_size)
DEPT  CLAS  DATE      
a     d     2001-10-10    1
b     d     2001-10-10    2
Name: n_item, dtype: int64

print (base_data.join(assort_size, on=['DEPT','CLAS','DATE']))
  CLAS       DATE DEPT  ITEM  STOCK  n_item
0    d 2001-10-10    a     1     -1       1
1    d 2001-10-10    a     2      1       1
2    d 2001-10-10    b     3      2       2
3    d 2001-10-10    b     4      2       2

这篇关于Pandas 按唯一计数分组为新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆