识别 pandas 数据框中的组之间的差异 [英] Identifying differences between groups in pandas dataframe

查看:78
本文介绍了识别 pandas 数据框中的组之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个按日期和ID索引的熊猫数据框.我想:

I have a pandas dataframe indexed by date and and ID. I would like to:

  1. 确定日期之间添加和删除的ID
  2. 将ID和添加/删除日期一起添加到另一个数据框.

 

date        ID   value
12/31/2010  13  -0.124409
             9   0.555959
             1  -0.705634
             2  -3.123603
             4   0.725009
1/31/2011   13   0.471078
             9   0.276006
             1  -0.468463
            22   1.076821
            11   0.668599

所需的输出:

date        ID  flag
1/31/2011   22  addition
1/31/2011   11  addition
1/31/2011   2   deletion
1/31/2011   4   deletion

我尝试了在熊猫中两个数据框之间的差异 .我无法使它在分组的数据帧上工作.我不确定如何遍历每个组,并与上一个组进行比较.

I have tried Diff between two dataframes in pandas . I cannot get this to work on a grouped dataframe. I am unsure how to loop over each group, and compare to the previous group.

推荐答案

我创建了一个辅助函数,用于移动pandas.MultiIndex的第一级.这样,我可以将其与原始索引进行区别,以确定添加和删除.

I created a helper function that shifts the first level of a pandas.MultiIndex. With this, I can difference it with the original index to determine additions and deletions.

def shift_level(idx):
    level = idx.levels[0]
    mapping = dict(zip(level[:-1], level[1:]))
    idx = idx.set_levels(level.map(mapping.get), 0)
    return idx[idx.get_level_values(0).notna()].remove_unused_levels()

idx = df.index
fidx = shift_level(idx)

additions = fidx.difference(idx)
deletions = idx[idx.labels[0] > 0].difference(fidx)

pd.Series('+', additions).append(
    pd.Series('-', deletions)).rename('flag').reset_index()

        date  ID flag
0 2011-01-31   2    +
1 2011-01-31   4    +
2 2011-01-31  11    -
3 2011-01-31  22    -

这篇关于识别 pandas 数据框中的组之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆