计算数据子集的统计量 [英] Calculating statistics on subsets of data
本文介绍了计算数据子集的统计量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是我的数据的一个可重现的小示例:
Here is a small reproducible example of my data:
> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")
> mydata
subject time measure
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
我想生成一个包含该特定主题的measure
平均值的新变量,因此:
I would like to generate a new variable containing the mean of measure
for that particular subject, so:
subject time measure mn_measure
1 0 10 10
1 1 12 10
1 2 8 10
2 0 7 2.333
2 1 0 2.333
2 2 0 2.333
除了以编程方式循环遍历所有记录或首先重新调整为宽格式之外,是否有一种简单的方法可以做到这一点?
Is there an easy way to do this, other than looping through all the records programatically or reshaping to wide format first ?
推荐答案
使用基本的 R 函数 ave()
,尽管它的名称令人困惑,但可以计算各种统计信息,包括 平均
:
Use the base R function ave()
, which despite its confusing name, can calculate a variety of statistics, including the mean
:
within(mydata, mean<-ave(measure, subject, FUN=mean))
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
<小时>
请注意,我使用 within
只是为了缩短代码.这是不带 within()
的等价物:
Note that I use within
just for the sake of shorter code. Here is the equivalent without within()
:
mydata$mean <- ave(mydata$measure, mydata$subject, FUN=mean)
mydata
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
这篇关于计算数据子集的统计量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文