做groupby时保留其他列 [英] Keep other columns when doing groupby

查看:125
本文介绍了做groupby时保留其他列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Pandas 数据帧上使用 groupby 来删除所有没有特定列最小值的行.像这样:

df1 = df.groupby("item", as_index=False)["diff"].min()

但是,如果我的列多于这两列,则其他列(例如,在我的示例中为 otherstuff)将被删除.我可以使用 groupby 保留这些列,还是必须找到不同的方法来删除行?

我的数据看起来像:

 item diff otherstuff0 1 2 11 1 1 22 1 3 73 2 -1 04 2 1 35 2 4 96 2 -6 27 3 0 08 3 2 9

最后应该是:

 item diff otherstuff0 1 1 21 2 -6 22 3 0 0

但我得到的是:

 项目差异0 1 11 2 -62 3 0

我一直在查看文档,但找不到任何内容.我试过了:

df1 = df.groupby(["item", "otherstuff"], as_index=false)["diff"].min()df1 = df.groupby("item", as_index=false)["diff"].min()["otherstuff"]df1 = df.groupby("item", as_index=false)["otherstuff", "diff"].min()

但这些都不起作用(我在最后一个意识到语法用于在创建组后进行聚合).

解决方案

方法#1:使用idxmin()来获取元素的索引>diff,然后选择那些:

<预><代码>>>>df.loc[df.groupby("item")["diff"].idxmin()]项目差异其他东西1 1 1 26 2 -6 27 3 0 0[3 行 x 3 列]

方法#2:按diff排序,然后取每个item组中的第一个元素:

<预><代码>>>>df.sort_values("diff").groupby("item", as_index=False).first()项目差异其他东西0 1 1 21 2 -6 22 3 0 0[3 行 x 3 列]

请注意,即使行内容相同,生成的索引也不同.

I'm using groupby on a pandas dataframe to drop all rows that don't have the minimum of a specific column. Something like this:

df1 = df.groupby("item", as_index=False)["diff"].min()

However, if I have more than those two columns, the other columns (e.g. otherstuff in my example) get dropped. Can I keep those columns using groupby, or am I going to have to find a different way to drop the rows?

My data looks like:

    item    diff   otherstuff
   0   1       2            1
   1   1       1            2
   2   1       3            7
   3   2      -1            0
   4   2       1            3
   5   2       4            9
   6   2      -6            2
   7   3       0            0
   8   3       2            9

and should end up like:

    item   diff  otherstuff
   0   1      1           2
   1   2     -6           2
   2   3      0           0

but what I'm getting is:

    item   diff
   0   1      1           
   1   2     -6           
   2   3      0                 

I've been looking through the documentation and can't find anything. I tried:

df1 = df.groupby(["item", "otherstuff"], as_index=false)["diff"].min()

df1 = df.groupby("item", as_index=false)["diff"].min()["otherstuff"]

df1 = df.groupby("item", as_index=false)["otherstuff", "diff"].min()

But none of those work (I realized with the last one that the syntax is meant for aggregating after a group is created).

解决方案

Method #1: use idxmin() to get the indices of the elements of minimum diff, and then select those:

>>> df.loc[df.groupby("item")["diff"].idxmin()]
   item  diff  otherstuff
1     1     1           2
6     2    -6           2
7     3     0           0

[3 rows x 3 columns]

Method #2: sort by diff, and then take the first element in each item group:

>>> df.sort_values("diff").groupby("item", as_index=False).first()
   item  diff  otherstuff
0     1     1           2
1     2    -6           2
2     3     0           0

[3 rows x 3 columns]

Note that the resulting indices are different even though the row content is the same.

这篇关于做groupby时保留其他列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆