计算数据子集的统计 [英] Calculating statistics on subsets of data
本文介绍了计算数据子集的统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是一个很小的可重复的数据示例:
> mydata< - structure(list(subject = c(1,1,1,2,2,2)),time = c(0,1,2,0,1,2),measure = c(10,12, 8,7,0,0)),.Names = c(subject,time,measure),row.names = c(NA,-6L),class =data.frame)
> mydata
主题时间测量
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
我想生成一个包含
主题时间测量mn_measure
1 0 10 10
1 1 12 10
1 2 8 10
2 0 7 2.333
2 1 0 2.333
2 2 0 2.333
有没有一个简单的方法来做这个,除了循环遍历所有的记录或重新整形到宽格式?
解决方案
使用基本R函数 ave()
,尽管其混淆名称,可以计算各种统计资料,包括意味着
:
$ b $ (mydata,mean< -ave(measure,subject,FUN = mean))
主题时间测量意味着
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
请注意,我在 code>只是为了更短的代码。 ()中的的等价物:
mydata $ mean< ; - ave(mydata $ measure,mydata $ subject,FUN = mean)
mydata
主题时间测量意味着
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
Here is a small reproducible example of my data:
> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")
> mydata
subject time measure
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
I would like to generate a new variable containing the mean of measure
for that particular subject, so:
subject time measure mn_measure
1 0 10 10
1 1 12 10
1 2 8 10
2 0 7 2.333
2 1 0 2.333
2 2 0 2.333
Is there an easy way to do this, other than looping through all the records programatically or reshaping to wide format first ?
解决方案
Use the base R function ave()
, which despite its confusing name, can calculate a variety of statistics, including the mean
:
within(mydata, mean<-ave(measure, subject, FUN=mean))
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
Note that I use within
just for the sake of shorter code. Here is the equivalent without within()
:
mydata$mean <- ave(mydata$measure, mydata$subject, FUN=mean)
mydata
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
这篇关于计算数据子集的统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文