Pandas 按唯一计数分组为新列 [英] Pandas group by unique counts as new column
问题描述
我想在我的 Pandas 数据框中添加一个新列 col
,它将被计算为:
I want to add a new column col
in my pandas data frame which will be calculated as:
select count(distinct ITEM) as col
from base_data
where STOCK > 0
group by DEPT, CLAS, DATE;
我正在做的事情
assort_size = base_data[(base_data['STOCK'] > 0)]\
.groupby(['DEPT','CLAS','DATE'])['ITEM']\
.transform('nunique')
基本上对于每个部门、班级、日期组合,我想获取库存中存在的项目数.因此,我想将此与父数据框合并的结果,但结果显示为 pandas.core.series.Series
所以我不能 append (axis=1)
它返回(行数不同,例如 1.6 M 与 1.4 M).此外,我没有要加入的 DEPT、CLAS、DATE
列.我可以在这里做什么来获取按列分组的数据框?
Basically for each dept, class, date combination I want to get number of items which are present in stock. So I then want to result of this merge with parent data frame but result is coming out as pandas.core.series.Series
so I can not append (axis=1)
it back (row count differs e.g. 1.6 M Vs 1.4 M). Also I don't have DEPT, CLAS, DATE
columns to join. What can I do here to get dataframe with group by columns?
有没有比创建一个新对象更好的方法来直接在父 Pandas 数据框 (base_data
) 中创建新列,就像我创建 assort_size
一样?
Is there any better way to create new column directly in parent pandas dataframe (base_data
)than creating a new object like I am creating assort_size
?
推荐答案
您可以使用 首先布尔索引
,然后groupby
和nunique
和最后一个 join
:
You can use boolean indexing
first, then groupby
with nunique
and last join
:
base_data = pd.DataFrame({"DEPT": ["a", "a", "b", "b"],
"CLAS":['d','d','d','d'],
"STOCK": [-1, 1, 2,2],
"DATE":pd.to_datetime(['2001-10-10','2001-10-10',
'2001-10-10','2001-10-10']),
"ITEM":[1,2,3,4]})
print (base_data)
CLAS DATE DEPT ITEM STOCK
0 d 2001-10-10 a 1 -1
1 d 2001-10-10 a 2 1
2 d 2001-10-10 b 3 2
3 d 2001-10-10 b 4 2
assort_size = base_data[(base_data['STOCK'] > 0)]\
.groupby(['DEPT','CLAS','DATE'])['ITEM'].nunique().rename('n_item')
print (assort_size)
DEPT CLAS DATE
a d 2001-10-10 1
b d 2001-10-10 2
Name: n_item, dtype: int64
print (base_data.join(assort_size, on=['DEPT','CLAS','DATE']))
CLAS DATE DEPT ITEM STOCK n_item
0 d 2001-10-10 a 1 -1 1
1 d 2001-10-10 a 2 1 1
2 d 2001-10-10 b 3 2 2
3 d 2001-10-10 b 4 2 2
这篇关于Pandas 按唯一计数分组为新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!