R中重复行之间的平均值 [英] average between duplicated rows in R

查看:2639
本文介绍了R中重复行之间的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框df,其中的行与名称列重复,但与值列不重复:

I have a data frame df with rows that are duplicates for the names column but not for the values column:

name    value   etc1    etc2
A       9       1       X
A       10      1       X
A       11      1       X
B       2       1       Y
C       40      1       Y
C       50      1       Y

在计算值列的平均值时,我需要将重复的名称聚合为一行.预期输出如下:

I need to aggregate the duplicate names into one row, while calculating the mean over the values column. The expected output is as follows:

name    value   etc1    etc2
A       10      1       X
B       2       1       Y
C       45      1       Y

我尝试使用df[duplicated(df$name),],但是当然这不会给我重复项的平均值.我想使用aggregate(),但是问题在于该函数的FUN部分也将适用于所有其他列,并且在其他问题中,它将无法计算char内容.由于所有其他列在重复项"上具有相同的内容,因此我需要像名称列一样对它们进行汇总.有任何提示...吗?

I have tried to use df[duplicated(df$name),] but of course this does not give me the mean over the duplicates. I would like to use aggregate(), but the problem is that the FUN part of this function will apply to all the other columns as well, and among other problems, it will not be able to compute char content. Since all the other columns have the same content over the "duplicates", I need them to be aggregated as is just like the name column. Any hints...?

推荐答案

这里是data.table解决方案.从某种意义上说,该解决方案是通用的,即使对于具有60列的data.frame也可以使用.由于我按值不同的所有变量对数据进行分组(请参见下面的创建键的方法)

Here a data.table solution. The solution is general in the sense it will work even for a data.frame with 60 columns. Since I group the data by all variables different of value( See how I create keys below)

library(data.table)
dat <- read.table(text='name    value   etc1    etc2
A       9       1       X
A       10      1       X
A       11      1       X
B       2       1       Y
C       40      1       Y
C       50      1       Y',header=TRUE)
keys <- colnames(dat)[!grepl('value',colnames(dat))]
X <- as.data.table(dat)
X[,list(mm= mean(value)),keys]
  name etc1 etc2 mm
1:    A    1    X 10
2:    B    1    Y  2
3:    C    1    Y 45

EDIT 扩展到多个 value 变量

如果要计算平均值的数字变量不止一个,例如,如果数据看起来像这样

In case you have more than one numeric variables on which you want to compute the mean , For example, if your data look like this

  name value etc1 etc2     value1
1    A     9    1    X  2.1763485
2    A    10    1    X -0.7954326
3    A    11    1    X -0.5839844
4    B     2    1    Y -0.5188709
5    C    40    1    Y -0.8300233
6    C    50    1    Y -0.7787496

上面的解决方案可以像这样扩展:

The above solution can be extended like this :

X[,lapply(.SD,mean),keys]
   name etc1 etc2 value     value1
1:    A    1    X    10  0.2656438
2:    B    1    Y     2 -0.5188709
3:    C    1    Y    45 -0.8043865

这将计算键列表中不存在的所有变量的均值.

This will compute the mean for all variables that don't exist in keys list.

这篇关于R中重复行之间的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆