用R中的均值,总和,长度和sd实现频率计数的更简单方法 [英] A simpler way to achieve a frequency count with mean, sum, length and sd in R

查看:107
本文介绍了用R中的均值,总和,长度和sd实现频率计数的更简单方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的任务是创建带有统计摘要的频率表.我的目标是创建一个可以简单导出为ex​​cel的数据框. 大部分可能是在使用存储过程的sql中,但是我决定在R中这样做.我正在学习R,所以我可能会做得很长.这是来自 getting-r-frequency-counts-for -all-possible-answers

I've been tasked with creating frequency tables with statistical summaries. My goal is to create a data frame that can be exported simply to excel. Most of this could be in sql using stored procedures but I decided to do this in R. I'm learning R so I might be doing it the long way. This is a follow on question from getting-r-frequency-counts-for-all-possible-answers

给出

    Id <- c(1,2,3,4,5,6,7,8,9,10)
    ClassA <- c(1,NA,3,1,1,2,1,4,5,3)
    ClassB <- c(2,1,1,3,3,2,1,1,3,3)
    R <- c(1,2,3,NA,9,2,4,5,6,7)
    S <- c(3,7,NA,9,5,8,7,NA,7,6)
    df <- data.frame(Id,ClassA,ClassB,R,S)

    ZeroTenNAScale <- c(0:10,NA);

    R.freq <- setNames(nm=c('answer','value'),data.frame(table(factor(df$R,levels=ZeroTenNAScale,exclude=NULL))));
    R.freq[, 1] <- as.numeric(as.character( R.freq[, 1] ))
    R.freq <- cbind(question='R',R.freq)

    S.freq <- setNames(nm=c('answer','value'),data.frame(table(factor(df$S,levels=ZeroTenNAScale,exclude=NULL))));
    S.freq[, 1] <- as.numeric(as.character( S.freq[, 1] ))
    S.freq <- cbind(question='S',S.freq)

    R.mean = mean(df$R, na.rm = TRUE) 
    R.length = sum(!is.na(df$R)) 
    R.sd = sd(df$R, na.rm = TRUE) 
    R.sum = sum(df$R, na.rm = TRUE)

    S.mean = mean(df$S, na.rm = TRUE) 
    S.length = sum(!is.na(df$S)) 
    S.sd = sd(df$S, na.rm = TRUE) 
    S.sum = sum(df$S, na.rm = TRUE)

    S.row <- cbind('S','sum',as.numeric(S.sum))
    S.row <- setNames(nm=c('question','answer','value'),data.frame(S.row))
    S.freq = rbind(S.freq, S.row )

    S.row <- cbind('S','length',as.numeric(S.length))
    S.row <- setNames(nm=c('question','answer','value'),data.frame(S.row))
    S.freq = rbind(S.freq, S.row )

    S.row <- cbind('S','mean',as.numeric(S.mean))
    S.row <- setNames(nm=c('question','answer','value'),data.frame(S.row))
    S.freq = rbind(S.freq, S.row )

    S.row <- cbind('S','sd',as.numeric(S.sd))
    S.row <- setNames(nm=c('question','answer','value'),data.frame(S.row))
    S.freq = rbind(S.freq, S.row )

    R.row <- cbind('R','sum',as.numeric(R.sum))
    R.row <- setNames(nm=c('question','answer','value'),data.frame(R.row))
    R.freq = rbind(R.freq, R.row )

    R.row <- cbind('R','length',as.numeric(R.length))
    R.row <- setNames(nm=c('question','answer','value'),data.frame(R.row))
    R.freq = rbind(R.freq, R.row )

    R.row <- cbind('R','mean',as.numeric(R.mean))
    R.row <- setNames(nm=c('question','answer','value'),data.frame(R.row))
    R.freq = rbind(R.freq, R.row )

    R.row <- cbind('R','sd',as.numeric(R.sd))
    R.row <- setNames(nm=c('question','answer','value'),data.frame(R.row))
    R.freq = rbind(R.freq, R.row )

    result <- rbind(R.freq,S.freq)
    result <- cbind(filter='None',result)
    result  

我知道

   filter question answer            value
1    None        R      0                0
2    None        R      1                1
3    None        R      2                2
4    None        R      3                1
5    None        R      4                1
6    None        R      5                1
7    None        R      6                1
8    None        R      7                1
9    None        R      8                0
10   None        R      9                1
11   None        R     10                0
12   None        R   <NA>                1
13   None        R    sum               39
14   None        R length                9
15   None        R   mean 4.33333333333333
16   None        R     sd 2.64575131106459
17   None        S      0                0
18   None        S      1                0
19   None        S      2                0
20   None        S      3                1
21   None        S      4                0
22   None        S      5                1
23   None        S      6                1
24   None        S      7                3
25   None        S      8                1
26   None        S      9                1
27   None        S     10                0
28   None        S   <NA>                2
29   None        S    sum               52
30   None        S length                8
31   None        S   mean              6.5
32   None        S     sd  1.8516401995451

几乎是我要寻找的东西.我看到的下一步是开始包装一些函数以简化代码,然后再开始从ClassA = 1,ClassA = n + 1 ... ClassA = NA,然后ClassB = 1,ClassB = 2 ... ClassB = NA.有更简单的方法吗?

Which is pretty much what I'm looking for. The next step as I see it is to start wrapping in some functions to simplify the code before I start adding in similar result sets from ClassA=1, ClassA=n+1 ... ClassA=NA, then ClassB=1, ClassB=2 ... ClassB=NA. Is there a much simpler way of doing this?

研究了 Ernest A 这要简单得多,而使我训练团队的其他任务也要简单得多.感谢 Ernest A Imo .

Which is much simpler and make my other task of training our team much simpler. Thanks to Ernest A and Imo.

与我对R的理解有关的下一个问题是

The next question in relation to my understanding of R is Using vectors in R to change the output of a function

推荐答案

是的,绝对可以简化.通常,您会使用汇总功能,例如

Yes, it definitely can be simplified. Typically you would use a summary function such as

smry <- function(x, levels) {
    xx <- na.omit(x)
    c(table(factor(x, levels=levels), useNA='always', exclude=NULL),
      sum=sum(xx), length=length(x), mean=mean(xx), sd=sqrt(var(xx)))
}

然后将其应用于数据的不同子集

then apply it to the different subsets of the data

> lapply(df[c('R', 'S')], smry, 0:10)
$R
        0         1         2         3         4         5         6         7 
 0.000000  1.000000  2.000000  1.000000  1.000000  1.000000  1.000000  1.000000 
        8         9        10      <NA>       sum    length      mean        sd 
 0.000000  1.000000  0.000000  1.000000 39.000000 10.000000  4.333333  2.645751 

$S
       0        1        2        3        4        5        6        7 
 0.00000  0.00000  0.00000  1.00000  0.00000  1.00000  1.00000  3.00000 
       8        9       10     <NA>      sum   length     mean       sd 
 1.00000  1.00000  0.00000  2.00000 52.00000 10.00000  6.50000  1.85164 

如果您绝对必须将所有内容都放在数据框中

If you absolutely have to put everything in a data frame

> as.data.frame(as.table(simplify2array(lapply(df[c('R', 'S')], smry, 0:10))))
     Var1 Var2      Freq
1       0    R  0.000000
2       1    R  1.000000
3       2    R  2.000000
4       3    R  1.000000
5       4    R  1.000000
6       5    R  1.000000
7       6    R  1.000000
8       7    R  1.000000
9       8    R  0.000000
10      9    R  1.000000
11     10    R  0.000000
12   <NA>    R  1.000000
13    sum    R 39.000000
14 length    R 10.000000
15   mean    R  4.333333
16     sd    R  2.645751
17      0    S  0.000000
18      1    S  0.000000
19      2    S  0.000000
20      3    S  1.000000
21      4    S  0.000000
22      5    S  1.000000
23      6    S  1.000000
24      7    S  3.000000
25      8    S  1.000000
26      9    S  1.000000
27     10    S  0.000000
28   <NA>    S  2.000000
29    sum    S 52.000000
30 length    S 10.000000
31   mean    S  6.500000
32     sd    S  1.851640

,然后只需更改列名称/根据需要添加列即可.

and then simply change the column names / add columns as you need.

这篇关于用R中的均值,总和,长度和sd实现频率计数的更简单方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆