计算数据子集的统计 [英] Calculating statistics on subsets of data

查看：107 发布时间：2017/3/25 22:05:46 r dataframe

本文介绍了计算数据子集的统计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一个很小的可重复的数据示例：

 > mydata<  -  structure（list（subject = c（1,1,1,2,2,2）），time = c（0,1,2,0,1,2），measure = c（10，12， 8，7，0，0）），.Names = c（subject，time，measure），row.names = c（NA，-6L），class =data.frame）
 
> mydata 
 
主题时间测量
 1 0 10 
 1 1 12 
 1 2 8 
 2 0 7 
 2 1 0 
 2 2 0

我想生成一个包含

 主题时间测量mn_measure 
 1 0 10 10 
 1 1 12 10 
 1 2 8 10 
 2 0 7 2.333 
 2 1 0 2.333 
 2 2 0 2.333

有没有一个简单的方法来做这个，除了循环遍历所有的记录或重新整形到宽格式？

解决方案

使用基本R函数 ave（），尽管其混淆名称，可以计算各种统计资料，包括意味着：

 
 $ b $ （mydata，mean< -ave（measure，subject，FUN = mean））
 
主题时间测量意味着
 1 1 0 10 10.000000 
 2 1 1 12 10.000000 
 3 1 2 8 10.000000 
 4 2 0 7 2.333333 
 5 2 1 0 2.333333 
 6 2 2 0 2.333333

请注意，我在 code>只是为了更短的代码。（）中的的等价物：

  mydata $ mean< ;  -  ave（mydata $ measure，mydata $ subject，FUN = mean）
 mydata 
主题时间测量意味着
 1 1 0 10 10.000000 
 2 1 1 12 10.000000 
 3 1 2 8 10.000000 
 4 2 0 7 2.333333 
 5 2 1 0 2.333333 
 6 2 2 0 2.333333

Here is a small reproducible example of my data:

> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")

> mydata

subject  time  measure
1          0      10
1          1      12
1          2       8
2          0       7
2          1       0
2          2       0

I would like to generate a new variable containing the mean of measure for that particular subject, so:

subject  time  measure  mn_measure
1          0      10      10
1          1      12      10
1          2       8      10
2          0       7      2.333
2          1       0      2.333
2          2       0      2.333

Is there an easy way to do this, other than looping through all the records programatically or reshaping to wide format first ?

解决方案

Use the base R function ave(), which despite its confusing name, can calculate a variety of statistics, including the mean:

within(mydata, mean<-ave(measure, subject, FUN=mean))

  subject time measure      mean
1       1    0      10 10.000000
2       1    1      12 10.000000
3       1    2       8 10.000000
4       2    0       7  2.333333
5       2    1       0  2.333333
6       2    2       0  2.333333



Note that I use within just for the sake of shorter code.  Here is the equivalent without within():
mydata$mean <- ave(mydata$measure, mydata$subject, FUN=mean)
mydata
  subject time measure      mean
1       1    0      10 10.000000
2       1    1      12 10.000000
3       1    2       8 10.000000
4       2    0       7  2.333333
5       2    1       0  2.333333
6       2    2       0  2.333333


                        
这篇关于计算数据子集的统计的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

计算数据子集的统计 [英] Calculating statistics on subsets of data

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算数据子集的统计 [英] Calculating statistics on subsets of data

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭