将分组的平均值添加到数据帧中的列 [英] Adding grouped mean values to column in data frame
问题描述
我想在数据框中计算组的方法,并在包含这些组平均值的原始数据框中创建一个新列。 (我正在做一个重复性研究,我想要在一个新的列中插入,单位和渠道中的测量值的平均值,所以我可以减去它并计算残差。)
I'd like to calculate group means in a data frame and create a new column in the original data frame containing those group mean values. (I'm doing a repeatability study and I want the mean value over measurements within an insertion, unit, and channel in a new column so I can subtract it off and calculate residuals.)
我的资料:
> head(mytestdata,15)
Insertion Measurement Unit Channel Value
1 1 1 A5 10 9.41
2 1 1 A5 11 9.51
3 1 1 A5 12 10.59
4 1 1 A5 13 9.45
5 1 2 A5 10 9.42
6 1 2 A5 11 9.03
7 1 2 A5 12 10.62
8 1 2 A5 13 9.39
9 1 3 A5 10 9.38
10 1 3 A5 11 9.87
11 1 3 A5 12 11.34
12 1 3 A5 13 9.59
13 2 1 A5 10 12.10
14 2 1 A5 11 11.28
15 2 1 A5 12 12.95
具体来说,我想计算每个插入,单位和渠道的平均值,并将其作为meanValue添加到数据框中。然后从值中减去meanValue以获得残差。
Specifically, I want to calculate the mean Value per Insertion, Unit, and Channel, and add it to the data frame as meanValue. Then subtract meanValue from Value to get Residual.
应该如下所示:
Insertion Measurement Unit Channel Value meanValue
1 1 1 40 10 11.79 11.56
2 1 1 40 11 11.01 11.38
3 1 1 40 12 10.86 11.19
4 1 1 40 13 10.29 10.91
5 1 2 40 10 11.47 11.56
6 1 2 40 11 11.84 11.38
7 1 2 40 12 11.39 11.19
8 1 2 40 13 11.25 10.91
9 1 3 40 10 11.42 11.56
10 1 3 40 11 11.28 11.38
11 1 3 40 12 11.31 11.19
12 1 3 40 13 11.18 10.91
13 2 1 40 10 10.97 11.55
14 2 1 40 11 11.78 11.87
15 2 1 40 12 11.48 11.25
我知道如何让组意味着使用,聚合等,这让我有一个第二个列表或表中的值。我也有信心可以使用一些复杂的循环程序来获得我想要的东西,但是我希望在一个优雅的单线或双线解决方案中将它们回填到相同的数据框架中,我认为必须是一个做的方法,但经过几天的搜索,我没有找到它。我不想要一个麻烦的解决方案,因为当我扩展到更多的数据时,我希望它能够正常工作。
I know how to get the group means using by, aggregate, etc, which get me a second list or table with the values in it. I'm also confident I could get what I want using some convoluted looping procedures, but I'm looking to stuff them back in the same data frame in an elegant one- or two-line solution, and I figure there's got to be a way to do it but after days of searching I'm not finding it. I don't want a cumbersome solution because I want it to work well when I scale up to lots more data.
推荐答案
使用 data.table
library(data.table)
setDT(mytestdata)[, c("MeanValue", "Residual") := {m= mean(Value);list(m, Value-m)}, by=list(Insertion, Unit, Channel)]
mytestdata
# Insertion Measurement Unit Channel Value MeanValue Residual
# 1: 1 1 A5 10 9.41 9.403333 0.006666667
# 2: 1 1 A5 11 9.51 9.470000 0.040000000
# 3: 1 1 A5 12 10.59 10.850000 -0.260000000
# 4: 1 1 A5 13 9.45 9.476667 -0.026666667
# 5: 1 2 A5 10 9.42 9.403333 0.016666667
# 6: 1 2 A5 11 9.03 9.470000 -0.440000000
# 7: 1 2 A5 12 10.62 10.850000 -0.230000000
# 8: 1 2 A5 13 9.39 9.476667 -0.086666667
# 9: 1 3 A5 10 9.38 9.403333 -0.023333333
# 10: 1 3 A5 11 9.87 9.470000 0.400000000
# 11: 1 3 A5 12 11.34 10.850000 0.490000000
# 12: 1 3 A5 13 9.59 9.476667 0.113333333
# 13: 2 1 A5 10 12.10 12.100000 0.000000000
# 14: 2 1 A5 11 11.28 11.280000 0.000000000
# 15: 2 1 A5 12 12.95 12.950000 0.000000000
这篇关于将分组的平均值添加到数据帧中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!