pandas :聚合具有多个功能的多个列 [英] Pandas: aggregating multiple columns with multiple functions

查看:121
本文介绍了 pandas :聚合具有多个功能的多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python中的Pandas和R中的Dplyr都是灵活的数据争吵工具。例如,在R中,用dplyr可以执行以下操作:

  custom_func<  -  function(col1,col2)length(col1)+ length(col2)

ChickWeight%>%
group_by(Diet)%>%
summaryize(m_weight = mean(weight),
var_time = var(Time),
covar = cov重量,时间),
odd_stat = custom_func(weight,Time))

一个声明




  • 我可以在一行中聚合多列。

  • 我可以对这些列应用不同的功能一行中的多个列。

  • 我可以使用考虑两列的函数。

  • 我可以为其中的任何一个引入自定义函数。

  • 我可以为这些聚合声明新的列名。



这样的模式也是可能的在大熊猫?请注意,我有兴趣在一个简短的声明中(因此不要创建三个不同的数据框,然后加入它们)。



编辑

我注意到这个问题被下载了。如果有人可以提到为什么这个职位被撤职,我可能有机会改进这个问题。

解决方案

使用大熊猫 groupby.apply(),您可以在groupby聚合中运行多个功能。请注意,您需要安装 scipy 的统计功能。对于自定义函数,需要像 sum()一样运行集合数据:

  def customfct(x,y):
data = x / y
return data.mean()

def f(row):
row ['m_weight'] = row ['weight']。mean()
row ['var_time'] = row ['Time']。var()
row ['cov'] = row [ 'weight']。cov(row ['Time'])
row ['odd_stat'] = customfct(row ['weight'],row ['Time'])
return row

aggdf = df.groupby('Diet')。apply(f)


Pandas in Python and Dplyr in R are both flexible data wrangling tools. For example, in R, with dplyr one can do the following;

custom_func <- function(col1, col2) length(col1) + length(col2)

ChickWeight %>% 
  group_by(Diet) %>% 
  summarise(m_weight = mean(weight), 
            var_time = var(Time), 
            covar = cov(weight, Time),
            odd_stat = custom_func(weight, Time))

Notice how in one statement;

  • I can aggregate over multiple columns in one line.
  • I can apply different functions over these multiple columns in one line.
  • I can use functions that take into account two columns.
  • I can throw in custom functions for any of these.
  • I can declare new column names for these aggregations.

Is such a pattern also possible in pandas? Note that I am interested in doing this in a short statement (so not creating three different dataframes and then joining them).

Edit

I've noticed the question got downvoted. If somebody could mention why the post was downvoted I might have the opportunity to improve the question.

解决方案

With pandas groupby.apply() you can run multiple functions in a groupby aggregation. Please note for statistical functions you would need scipy installed. For custom functions will need to run an aggregate like sum() for groupwise data:

def customfct(x,y):
    data = x / y
    return data.mean()

def f(row):  
    row['m_weight'] = row['weight'].mean()
    row['var_time'] = row['Time'].var()
    row['cov'] = row['weight'].cov(row['Time'])
    row['odd_stat'] = customfct(row['weight'], row['Time'])
    return row

aggdf = df.groupby('Diet').apply(f)

这篇关于 pandas :聚合具有多个功能的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆