R中重复行之间的平均值 [英] average between duplicated rows in R
问题描述
我有一个数据框df
,其中的行与名称列重复,但与值列不重复:
I have a data frame df
with rows that are duplicates for the names column but not for the values column:
name value etc1 etc2
A 9 1 X
A 10 1 X
A 11 1 X
B 2 1 Y
C 40 1 Y
C 50 1 Y
在计算值列的平均值时,我需要将重复的名称聚合为一行.预期输出如下:
I need to aggregate the duplicate names into one row, while calculating the mean over the values column. The expected output is as follows:
name value etc1 etc2
A 10 1 X
B 2 1 Y
C 45 1 Y
我尝试使用df[duplicated(df$name),]
,但是当然这不会给我重复项的平均值.我想使用aggregate()
,但是问题在于该函数的FUN部分也将适用于所有其他列,并且在其他问题中,它将无法计算char内容.由于所有其他列在重复项"上具有相同的内容,因此我需要像名称列一样对它们进行汇总.有任何提示...吗?
I have tried to use df[duplicated(df$name),]
but of course this does not give me the mean over the duplicates. I would like to use aggregate()
, but the problem is that the FUN part of this function will apply to all the other columns as well, and among other problems, it will not be able to compute char content. Since all the other columns have the same content over the "duplicates", I need them to be aggregated as is just like the name column. Any hints...?
推荐答案
这里是data.table
解决方案.从某种意义上说,该解决方案是通用的,即使对于具有60列的data.frame也可以使用.由于我按值不同的所有变量对数据进行分组(请参见下面的创建键的方法)
Here a data.table
solution. The solution is general in the sense it will work even for a data.frame with 60 columns. Since I group the data by all variables different of value( See how I create keys below)
library(data.table)
dat <- read.table(text='name value etc1 etc2
A 9 1 X
A 10 1 X
A 11 1 X
B 2 1 Y
C 40 1 Y
C 50 1 Y',header=TRUE)
keys <- colnames(dat)[!grepl('value',colnames(dat))]
X <- as.data.table(dat)
X[,list(mm= mean(value)),keys]
name etc1 etc2 mm
1: A 1 X 10
2: B 1 Y 2
3: C 1 Y 45
EDIT 扩展到多个 value 变量
如果要计算平均值的数字变量不止一个,例如,如果数据看起来像这样
In case you have more than one numeric variables on which you want to compute the mean , For example, if your data look like this
name value etc1 etc2 value1
1 A 9 1 X 2.1763485
2 A 10 1 X -0.7954326
3 A 11 1 X -0.5839844
4 B 2 1 Y -0.5188709
5 C 40 1 Y -0.8300233
6 C 50 1 Y -0.7787496
上面的解决方案可以像这样扩展:
The above solution can be extended like this :
X[,lapply(.SD,mean),keys]
name etc1 etc2 value value1
1: A 1 X 10 0.2656438
2: B 1 Y 2 -0.5188709
3: C 1 Y 45 -0.8043865
这将计算键列表中不存在的所有变量的均值.
This will compute the mean for all variables that don't exist in keys list.
这篇关于R中重复行之间的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!