是否有“由...取消分组”与 pandas 中的.groupby相反的操作? [英] Is there an "ungroup by" operation opposite to .groupby in pandas?

查看:201
本文介绍了是否有“由...取消分组”与 pandas 中的.groupby相反的操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们从这个简单的表开始,存储在一个熊猫数据框中:

 名字年龄系列
0 john 1 1
1 jason 36 1
2 jane 32 1
3 jack 26 2
4 james 30 2

然后我做

  group_df = df.groupby('family') 
group_df = group_df.aggregate({'name':name_join,'age':pd.np.mean})

其中 name_join 是名称的简单聚合函数:

  def name_join(list_names,concat =' - '):
return concat.join(list_names)

结果为:

 年龄名称
家庭
1 23 john- jason-jane
2 28 jack-james

现在是问题所在。 strong>



有没有一种快速,高效的方法来克服et到以下汇总表中?

 姓名年龄族
0约翰23 1
1杰森23 1
2 jane 23 1
3 jack 28 2
4 james 28 2

(注意:数字仅仅是一些例子,我不关心在这个具体例子中平均后我失去的信息)



我认为的方式我可以做到这一点看起来效率不高:


  1. 创建空数据框

  2. group_df ,分隔名称

  3. 返回一个数据框,其行数与起始行中的名称一样多
  4. 将输出附加到空数据框中


解决方案

将操作视为groupby的反面。

您将一个字符串拆分为小块,并将每个小块与家族 。 这个旧的答案



首先将'family'设置为索引列,然后参考上面的链接,然后 reset_index ()来获得想要的结果。


Suppose we start from this simple table, stored in a pandas dataframe:

    name  age  family
0   john    1       1
1  jason   36       1
2   jane   32       1
3   jack   26       2
4  james   30       2

Then I do

group_df = df.groupby('family')
group_df = group_df.aggregate({'name': name_join, 'age': pd.np.mean})

where name_join is a simple aggregating function for the names:

def name_join(list_names, concat='-'):
    return concat.join(list_names)

the output is:

        age             name
family                      
1        23  john-jason-jane
2        28       jack-james

Now the question.

Is there a quick, efficient way to get to the following from the aggregated table?

    name  age  family
0   john   23       1
1  jason   23       1
2   jane   23       1
3   jack   28       2
4  james   28       2

(Note: numbers are just examples, I don't care for the information I am losing after averaging in this specific example)

The way I thought I could do it does not look too efficient:

  1. create empty dataframe
  2. from every line in group_df, separate the names
  3. return a dataframe with as many rows as there are names in the starting row
  4. append the output to the empty dataframe

解决方案

It may not be helpful to think of the operation as the "opposite" of groupby.

You are splitting a string in to pieces, and maintaining each piece's association with 'family'. This old answer of mine does the job.

Just set 'family' as the index column first, refer to the link above, and then reset_index() at the end to get your desired result.

这篇关于是否有“由...取消分组”与 pandas 中的.groupby相反的操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆