pandas :具有相同名称的列的均值 [英] Pandas: Mean of columns with the same names

查看:105
本文介绍了 pandas :具有相同名称的列的均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据列,其中包含诸如:

I have a dataframe with columns like:

['id','name','foo1', 'foo1', 'foo1', 'foo2','foo2', 'foo3']

我想获得一个新的数据框,对具有相同名称的列进行平均:

I would like to get a new dataframe where columns sharing the same name are averaged:

['id','name','foo1', 'foo2','foo3']

此处foo1列是原始数据帧中名为foo1的三个列的平均值,foo2则是名为foo2的两个列的平均值,而foo3只是foo3

Here column foo1 would be the average of the three columns named foo1 in the original dataframe, foo2 would be the average of the two columns named foo2 and foo3 would be just foo3

注意:ID和名称不是数字,我必须保留它们.

Note: id and name are not numeric and I have to keep them.

推荐答案

基本思想是,您可以按列名称进行分组,并且对每个分组进行平均操作.

The basic idea is that you can group by your columns names and do mean operations for each group.

我看到了关于您问题的一些评论,并尝试为您提供实现目标的不同方法. (解决方案(3)是我发现的最好的!)

I saw some comments for your question and tried to give you different ways to achieve the goal. (Solution (3) is the best I found!)

(1)快速解决方案.如果您拥有非常有限的非数字列并且拥有唯一的名称,例如列idname.您可以做的是:

(1) Quick solution. If you have very limited columns that are non-numeric, and own unique names, e.g., columns id and name. What you can do is:

首先设置索引['id', 'name']来保存它们,

First set index ['id', 'name'] to preserve them,

df = df.set_index(['id', 'name']) 

然后在columns上使用DataFrame.groupby功能,设置axis=1(在每列上重复),为每个组应用mean功能.

then use DataFrame.groupby function on columns, set axis=1 (iterate over each column), apply mean function for each group.

df.groupby(by=df.columns, axis=1).mean()

最后,重置索引以恢复['id', 'name']

And finally, reset index to recover ['id', 'name'] columns

df = df.reset_index()

这是示例代码:

In [35]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))

In [36]: df = df.set_index(['id', 'name'])

In [37]: df = df.groupby(by=df.columns, axis=1).mean()

In [38]: df = df.reset_index()

In [39]: df
Out[39]: 
    id name  c1   c2    c3
0  001    a   1   55  1000
1  002    b   2  110  2000

(2)完整的解决方案.如果您有很多非数字且唯一命名的列,则可以执行以下操作:

(2) Complete solution. If you have lots of columns that are non-numeric and unique named, what you can do is:

首先将您的数据帧转置,

First transpose you dataframe,

df2 = df.transpose()

然后按操作进行分组(在其索引和axis=0上),但是要小心地处理每个组:对于这些数字组,请返回其平均值;对于这些数字组,请返回它们的平均值.对于这些非数字组,返回其第一行:

Then you do group by operations (on its index and axis=0), but carefully handle each groups: for these numeric groups, return their mean value; and for these non-numeric groups, return their first row:

df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])

最后,转回:

df = df2.transpose()

以下是代码示例:

In [98]: df = pd.DataFrame([['001', 'a', 1, 10, 100, 1000], ['002', 'b', 2, 20, 200, 2000]], columns=['id', 'name', 'c1', 'c2', 'c2', 'c3'], index=list('AB'))

In [99]: df2 = df.transpose()

In [100]: df2 = df2.groupby(by=df2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[0])

In [101]: df3 = df2.transpose()

In [102]: df3
Out[102]: 
  c1   c2    c3   id name
A  1   55  1000  001    a
B  2  110  2000  002    b

In [103]: df
Out[103]: 
    id name  c1  c2   c2    c3
A  001    a   1  10  100  1000
B  002    b   2  20  200  2000

您需要import numbers

更多说明:

(3)合而为一!此解决方案是我发现的最佳解决方案:

(3) All in one! This solution is the best I found:

df.groupby(by=df.columns, axis=1).apply(lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0])

我尝试处理未转位组的每个组,即

I tried to handle each group for the un-transposed groups, that is,

df.groupby(by=df.columns, axis=1).apply(gf)

还有

gf = lambda g: g.mean(axis=1) if isinstance(g.iloc[0,0], numbers.Number) else g.iloc[:,0]

我之前失败了,因为我不小心移开轴.您必须为mean函数设置axis=1,并为非数字组返回列.

I failed before, because I do not carefully hand the axis. You must set axis=1 for mean function, and return columns for non-numeric groups.

谢谢!

这篇关于 pandas :具有相同名称的列的均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆