合并 DataFrame 中的重复列 [英] Combine duplicated columns within a DataFrame
问题描述
如果我有一个包含同名列的数据框,有没有办法将具有相同名称的列与某种函数(即 sum)组合在一起?
If I have a dataframe that has columns that include the same name, is there a way to combine the columns that have the same name with some sort of function (i.e. sum)?
例如:
In [186]:
df["NY-WEB01"].head()
Out[186]:
NY-WEB01 NY-WEB01
DateTime
2012-10-18 16:00:00 5.6 2.8
2012-10-18 17:00:00 18.6 12.0
2012-10-18 18:00:00 18.4 12.0
2012-10-18 19:00:00 18.2 12.0
2012-10-18 20:00:00 19.2 12.0
如何通过对列名称相同的每一行求和来折叠 NY-WEB01 列(有一堆重复的列,而不仅仅是 NY-WEB01)?
How might I collapse the NY-WEB01 columns (there are a bunch of duplicate columns, not just NY-WEB01) by summing each row where the column name is the same?
推荐答案
我相信这可以满足您的需求:
I believe this does what you are after:
df.groupby(lambda x:x, axis=1).sum()
或者,速度提高 3% 到 15%,具体取决于 df 的长度:
Alternatively, between 3% and 15% faster depending on the length of the df:
df.groupby(df.columns, axis=1).sum()
要将其扩展到总和之外,请使用 .agg()
(.aggregate()
的缩写):
To extend this beyond sums, use .agg()
(short for .aggregate()
):
df.groupby(df.columns, axis=1).agg(numpy.max)
这篇关于合并 DataFrame 中的重复列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!