使用singe pandas groupby命令将不同的函数应用于不同的列 [英] Apply different functions to different columns with a singe pandas groupby command

查看:67
本文介绍了使用singe pandas groupby命令将不同的函数应用于不同的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据存储在 df 中.每个我有多个 users .我想按 group df 分组,并将不同的功能应用于不同的列.所不同的是,我想在此过程中为新列分配自定义名称.

My data is stored in df. I have multiple users per group. I want to group df by group and apply different functions to different columns. The twist is that I would like to assign custom names to the new columns during this process.

np.random.seed(123)
df = pd.DataFrame({"user":range(4),"group":[1,1,2,2],"crop":["2018-01-01","2018-01-01","2018-03-01","2018-03-01"],
                   "score":np.random.randint(400,1000,4)})
df["crop"] = pd.to_datetime(df["crop"])
print(df)
   user  group        crop  score
0     0      1  2018-01-01    910
1     1      1  2018-01-01    765
2     2      2  2018-03-01    782
3     3      2  2018-03-01    722

我想获取得分的平均值,以及 group 分组的 crop 的最小值和最大值strong>为每个新列分配自定义名称.所需的输出应如下所示:

I want to get the mean of score, and the min and max values of crop grouped by group and assign custom names to each new column. The desired output should look like this:

  group  mean_score    min_crop    max_crop
0     1       837.5  2018-01-01  2018-01-01
1     2       752.0  2018-03-01  2018-03-01

我不知道如何在Python的单行代码中执行此操作.在R中,我将使用 data.table 并获得以下信息:

I don't know how to do this in a one-liner in Python. In R, I would use data.table and get the following:

df[, list(mean_score = mean(score),
          max_crop   = max(crop),
          min_crop   = min(crop)), by = group]

我知道我可以对数据进行分组,然后将 .agg 与字典结合使用.有没有其他方法可以在此过程中自定义每个列的名称?

I know I could group the data and use .agg combined with a dictionary. Is there an alternative way where I can custom-name each column in this process?

推荐答案

尝试使用 groupby().apply()创建具有所需操作的函数:

Try creating a function with the required operations using groupby().apply():

def f(x):
    d = {}
    d['mean_score'] = x['score'].mean()
    d['min_crop'] = x['crop'].min()
    d['max_crop'] = x['crop'].max()
    return pd.Series(d, index=['mean_score', 'min_crop', 'max_crop'])

data = df.groupby('group').apply(f)

这篇关于使用singe pandas groupby命令将不同的函数应用于不同的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆