在 pandas GroupBy中将缺少的组键作为NaN包括在内 [英] Include missing group keys as NaN in pandas GroupBy output

查看：121 发布时间：2020/5/24 3:48:04 python pandas group-by pandas-groupby

本文介绍了在 pandas GroupBy中将缺少的组键作为NaN包括在内的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的熊猫有一个数据框.

I have a dataframe in pandas.

test_df = pd.DataFrame({'date': ['2018-12-28', '2018-12-28', '2018-12-29', '2018-12-29', '2018-12-30', '2018-12-30'],
                       'transaction': ['aa', 'bb', 'cc', 'aa', 'bb', 'bb'],
                       'ccy': ['USD', 'EUR', 'EUR', 'USD', 'USD', 'USD'],
                       'amt': np.random.random(6)})

test_df:

date         transaction  ccy       amt
2018-12-28   aa           USD  0.323439
2018-12-28   bb           EUR  0.048948
2018-12-29   cc           EUR  0.793263
2018-12-29   aa           USD  0.013865
2018-12-30   bb           USD  0.658571
2018-12-30   bb           USD  0.224951

以下代码为我提供了此输出.

The following code is giving me this output.

grouper = test_df.groupby([pd.Grouper('date'), 'transaction', 'ccy'])
grp_transactions = grouper['amt'].sum().unstack()

输出:

ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.323439
           bb           0.048948       NaN
2018-12-29 aa                NaN  0.013865
           cc           0.793263       NaN
2018-12-30 bb                NaN  0.883523

我相信这是可以预期的，因为groupby函数将根据上述顺序对列中的值进行分组，进行相应的求和，并且不会为不在DF中的交易创建新行.

I believe this is expected as the groupby function will group values in the columns based on the order above, sum accordingly, and not create new rows for transactions that are not in the DF.

如果使用groupby在特定的一天没有完成交易，熊猫中是否可以包含NaN值? IE.如果我的DF没有事务:cc于2018年12月28日，则两个ccy的输出应均为NaN.

Is there a way in pandas to include NaN values if a transaction is not done on a particular day when using groupby? ie. Output should be NaN for both ccy if my DF does not have transaction: cc on 28/12/2018.

预期输出:

ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.323439
           bb           0.048948       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.013865
           bb                NaN       NaN
           cc           0.793263       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  0.883523
           cc                NaN       NaN

任何帮助将不胜感激.谢谢！

Any help would be appreciated. Thanks!

推荐答案

如果在分组之前将交易"转换为分类列，这很容易

This is easy if you convert "transaction" to a categorical column before grouping,

df.transaction = pd.Categorical(df.transaction)
df.groupby(['date', 'transaction', 'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.404488
           bb           0.459295       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.439354
           bb                NaN       NaN
           cc           0.429269       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  1.542451
           cc                NaN       NaN

输出中缺少的类别由NaN表示.在执行数字聚合时通常可以做到这一点.

Missing categories in the output are represented by NaNs. This is usually possible when performing numeric aggregation.

如果您不想修改df，则可以这样做:

If you don't want to modify df, this will do:

u = pd.Series(pd.Categorical(df.transaction), name='transaction')
df.groupby(['date', u, 'ccy']).sum().unstack(2)

                             amt          
ccy                          EUR       USD
date       transaction                    
2018-12-28 aa                NaN  0.429134
           bb           0.852355       NaN
           cc                NaN       NaN
2018-12-29 aa                NaN  0.541576
           bb                NaN       NaN
           cc           0.994095       NaN
2018-12-30 aa                NaN       NaN
           bb                NaN  0.744587
           cc                NaN       NaN

这篇关于在 pandas GroupBy中将缺少的组键作为NaN包括在内的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 pandas GroupBy中将缺少的组键作为NaN包括在内 [英] Include missing group keys as NaN in pandas GroupBy output

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 pandas GroupBy中将缺少的组键作为NaN包括在内 [英] Include missing group keys as NaN in pandas GroupBy output

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭