根据另一列选择前n列 [英] Select top n columns based on another column

查看:83
本文介绍了根据另一列选择前n列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据库:

并且我想获得一个熊猫数据框,该数据框基于日期中人口最多的前2行进行过滤.输出应如下所示:

And I would like to obtain a pandas dataframe filtered for the 2 rows per date, based on the top ones that have the highest population. The output should look like this:

我知道熊猫提供了一个称为nlargest的公式: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nlargest.html

I know that pandas offers a formula called nlargest: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nlargest.html

但是我认为它不适用于此用例.有什么解决方法吗?

but I don't think it is usable for this use case. Is there any workaround?

非常感谢!

推荐答案

我模仿了您的数据框,如下所示,并提供了一种前进的方式来获得所需的数据,希望对您有所帮助.

I have mimicked your dataframe as below and provided a way forward to get the desired, hope that will helpful.

>>> df
        Date country  population
0 2019-12-31       A         100
1 2019-12-31       B          10
2 2019-12-31       C        1000
3 2020-01-01       A         200
4 2020-01-01       B          20
5 2020-01-01       C        3500
6 2020-01-01       D          12
7 2020-02-01       D        2000
8 2020-02-01       E          54

您所需的解决方案:

您可以将 nlargest 方法与 set_index ans groupby 方法一起使用.

Your Desired Solution:

You can use nlargest method along with set_index ans groupby method.

这就是你会得到的.

>>> df.set_index('country').groupby('Date')['population'].nlargest(2)
Date        country
2019-12-31  C          1000
            A           100
2020-01-01  C          3500
            A           200
2020-02-01  D          2000
            E            54
Name: population, dtype: int64

现在,您希望通过重置DataFrame的索引使DataFrame进入原始状态,这将为您提供以下..

Now, as you want the DataFrame into original state by resetting the index of the DataFrame, which will give you following ..

>>> df.set_index('country').groupby('Date')['population'].nlargest(2).reset_index()
        Date country  population
0 2019-12-31       C        1000
1 2019-12-31       A         100
2 2020-01-01       C        3500
3 2020-01-01       A         200
4 2020-02-01       D        2000
5 2020-02-01       E          54

另一种解决方法:

通过 groupby apply 函数,将 reset_index 与参数 drop = True level = ..

Another way around:

With groupby and apply function use reset_index with parameter drop=True and level= ..

>>> df.groupby('Date').apply(lambda p: p.nlargest(2, columns='population')).reset_index(level=[0,1], drop=True)
  # df.groupby('Date').apply(lambda p: p.nlargest(2, columns='population')).reset_index(level=['Date',1], drop=True)
        Date country  population
0 2019-12-31       C        1000
1 2019-12-31       A         100
2 2020-01-01       C        3500
3 2020-01-01       A         200
4 2020-02-01       D        2000
5 2020-02-01       E          54

这篇关于根据另一列选择前n列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆