计算数据子集的统计 [英] Calculating statistics on subsets of data

查看:107
本文介绍了计算数据子集的统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个很小的可重复的数据示例:

 > mydata<  -  structure(list(subject = c(1,1,1,2,2,2)),time = c(0,1,2,0,1,2),measure = c(10,12, 8,7,0,0)),.Names = c(subject,time,measure),row.names = c(NA,-6L),class =data.frame)

> mydata

主题时间测量
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0

我想生成一个包含

 主题时间测量mn_measure 
1 0 10 10
1 1 12 10
1 2 8 10
2 0 7 2.333
2 1 0 2.333
2 2 0 2.333

有没有一个简单的方法来做这个,除了循环遍历所有的记录或重新整形到宽格式?

解决方案

使用基本R函数 ave(),尽管其混淆名称,可以计算各种统计资料,包括意味着


$ b $ (mydata,mean< -ave(measure,subject,FUN = mean))

主题时间测量意味着
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333






请注意,我在 code>只是为了更短的代码。 ()中的的等价物:

  mydata $ mean< ;  -  ave(mydata $ measure,mydata $ subject,FUN = mean)
mydata
主题时间测量意味着
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333


Here is a small reproducible example of my data:

> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")

> mydata

subject  time  measure
1          0      10
1          1      12
1          2       8
2          0       7
2          1       0
2          2       0

I would like to generate a new variable containing the mean of measure for that particular subject, so:

subject  time  measure  mn_measure
1          0      10      10
1          1      12      10
1          2       8      10
2          0       7      2.333
2          1       0      2.333
2          2       0      2.333

Is there an easy way to do this, other than looping through all the records programatically or reshaping to wide format first ?

解决方案

Use the base R function ave(), which despite its confusing name, can calculate a variety of statistics, including the mean:

within(mydata, mean<-ave(measure, subject, FUN=mean))

  subject time measure      mean
1       1    0      10 10.000000
2       1    1      12 10.000000
3       1    2       8 10.000000
4       2    0       7  2.333333
5       2    1       0  2.333333
6       2    2       0  2.333333


Note that I use within just for the sake of shorter code. Here is the equivalent without within():

mydata$mean <- ave(mydata$measure, mydata$subject, FUN=mean)
mydata
  subject time measure      mean
1       1    0      10 10.000000
2       1    1      12 10.000000
3       1    2       8 10.000000
4       2    0       7  2.333333
5       2    1       0  2.333333
6       2    2       0  2.333333

这篇关于计算数据子集的统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆