如何在 Pandas Groupby 中仅显示带有值的列 [英] How to show only column with Values in Pandas Groupby
问题描述
您好数据科学家和熊猫专家,
Hello Data Scientist and Pandas Experts,
我需要一些帮助,因为我无法正确组织数据.这是我的数据框:
I need some help as I can’t get my data organized properly. Here is my data frame:
df_dict = [ {'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store1', 'employee': 'emp1', 'duties': 'opening'}, \
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'deli'}, \
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store1', 'employee': 'emp3', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store2', 'employee': 'emp1', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store2', 'employee': 'emp5', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store3', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store3', 'employee': 'emp7', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-03 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store1', 'employee': 'emp1', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store1', 'employee': 'emp3', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store2', 'employee': 'emp1', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store2', 'employee': 'emp5', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store3', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store3', 'employee': 'emp7', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-04 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store1', 'employee': 'emp1', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store1', 'employee': 'emp3', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store1', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store2', 'employee': 'emp1', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store2', 'employee': 'emp4', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store2', 'employee': 'emp5', 'duties': 'deli'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store3', 'employee': 'emp2', 'duties': 'closing'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'opening'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store3', 'employee': 'emp7', 'duties': 'cashier'},\
{'Date': Timestamp('2014-01-10 00:00:00'), 'Store': 'store3', 'employee': 'emp6', 'duties': 'deli'}]
我想按如下方式组织我的输出:
I want to organize my output as follow:
Store 1 Store 2 store3
Week emp1 emp2 emp3 emp1 emp4 emp5 emp2 emp6 emp7
2013-12-30 2 4 2 2 4 2 2 4 2
2014-01-06 1 1 1 1 1 1 2 1 1
所以我尝试按照表达式进行分组:
So I have tried following Group by expression:
df_group = dict_df.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Store', 'employee'])\
['duties'].count().unstack(level=1).unstack(level=1).reset_index()
然而,它显示了所有员工,而不是显示员工在该特定商店的工作示例:
However it shows all employee instead of showing employees work in that particular store example:
Store 1
Week emp1 emp2 emp3 emp4 emp5 emp6 emp7
2013-12-30 2 4 2 NaN NaN NaN NaN
2014-01-06 1 1 1 NaN NaN NaN NaN
那么我怎样才能得到我想要的结果.基本上我想过滤掉不在该商店工作的员工.
So how can I get my desire outcome. Basically I want to filter out the employees who are not working in that store.
使用 Groupby 来满足这个需求更好还是我应该考虑其他方法?
Is it better to use Groupby for this need or should I consider some other method?
预先感谢您的帮助和考虑.
Thank you in advance for your help and consideration.
推荐答案
尝试对多个层次进行解堆叠 [1, 2]
:
Try to unstack multiple levels [1, 2]
:
df_out = (df.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Store', 'employee'])['duties']
.count()
.unstack(level=[1, 2])
)
print(df_out)
打印:
Store store1 store2 store3
employee emp1 emp2 emp3 emp1 emp4 emp5 emp2 emp6 emp7
Date
2014-01-06 2 4 2 2 4 2 2 4 2
2014-01-13 1 2 1 1 2 1 1 2 1
这篇关于如何在 Pandas Groupby 中仅显示带有值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!