如何计算R中的汇总标准差? [英] How to calculate a pooled standard deviation in R?

查看:541
本文介绍了如何计算R中的汇总标准差?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算我的数据框中所有唯一网站的汇总(实际加权的)标准差。

这些地点的价值是单个物种林分的价值,我想集中平均值和标准差以便我可以将阔叶林与针叶树林进行比较。 >
这是数据框架(df),其中包含broadleaved站点的值:

pre $ keyc $ n $ s $ b $ $ b Vest02DenmDesp 3 58.16 6.16
Vest02DenmDesp 5 54.45 7.85
Vest02DenmDesp 3 51.34 1.71
Vest02DenmDesp 3 59.57 5.11
Vest02DenmDesp 5 62.89 10.26
Vest02DenmDesp 3 77.33 2.14
Mato10GermDesp 4 41.89 12.6
Mato10GermDesp 4 11.92 1.8
Wawa07ChinDesp 18 0.097 0.004
Chen12ChinDes 3 41.93 1.12
Hans11SwedDesp 2 1406.2 679.46
Hans11SwedDesp 2 1156.2 464.07
Hans11SwedDesp 2 4945.3 364.58

Keybl是网站的代码。汇总的公式为:

pre $ s $ sq $(s1)(s1 ^ 2 + n2-1 )*(s2 ^ 2)/(n1 + n2-2))

张贴的图片,并没有找到一个链接,将直接进入公式)

其中2是组的数量,因此将根据网站而改变。我知道这是用于t检验和两个组想要比较。在这种情况下,我不打算比较这些组。我的教授建议我用这个公式来得到一个加权的sd。我没有找到一个R函数,它以我需要的方式结合了这个公式,所以我试图建立自己的公式。不过,我是R的新手,不擅长制作功能和循环,所以我希望对你有所帮助。



这是我到目前为止:

$ $ p $ sd =函数(data){
nc1 = data [z,nc]
sc1 = data [z,sc]
nc2 = data [z + 1,nc]
sc2 = data [z + 1,sc]
sd1 =(nc1-1)* sc1 ^ 2 +(nc2-1)* sc2 ^ 2
sd2 = sd1 /(nc1 + ($)
sqrt(sd2)


splitdf = split(df,df $ keybl)drop = TRUE

for(c in 1:length(splitdf)){
for(i in 1:length(splitdf [[i]])){
a =(splitdf [[i]] )
b = sd(a)
}
}

1 )函数本身是不正确的,因为它给出了比它应该稍低的值,我不明白为什么。难道当z + 1到达最后一行时它不会停止吗?如果是这样的话,那怎么解决呢?



2)循环是完全错误的,但这是我在几个小时没有成功之后才想出来的。 >

有人可以帮我吗?



谢谢,

Antra

解决方案

在独立假设(所以协方差项可以假定为零)假设下的汇集SD将是:sqrt(sum_over_groups [(var)/ sum(n)-N_groups)]) p>

  lapply(split(dat,dat $ keybl),
function(dd)sqrt(sum(dd $ sd ^ 2 *(dd $ n-1))/(sum(dd $ n-1)-nrow(dd))))
#------------------ -------
$ Chen12ChinDesp
[1] 1.583919

$ Hans11SwedDesp
[1] Inf

$ Mato10GermDesp
[1] 11.0227
$ b $ Vest02DenmDesp
[1] 9.003795

$ Wawa07ChinDesp
[1] 0.004123106


I want to calculate the pooled (actually weighted) standard deviation for all the unique sites in my data frame.

The values for these sites are values for single species forest stands and I want to pool the mean and the sd so that I can compare broadleaved stands with conifer stands.
This is the data frame (df) with values for the broadleaved stands:

keybl           n   mean    sd
Vest02DenmDesp  3   58.16   6.16
Vest02DenmDesp  5   54.45   7.85
Vest02DenmDesp  3   51.34   1.71
Vest02DenmDesp  3   59.57   5.11
Vest02DenmDesp  5   62.89   10.26
Vest02DenmDesp  3   77.33   2.14
Mato10GermDesp  4   41.89   12.6
Mato10GermDesp  4   11.92   1.8
Wawa07ChinDesp  18  0.097   0.004
Chen12ChinDesp  3   41.93   1.12
Hans11SwedDesp  2   1406.2  679.46
Hans11SwedDesp  2   1156.2  464.07
Hans11SwedDesp  2   4945.3  364.58

Keybl is the code for the site. The formula for the pooled SD is:

s=sqrt((n1-1)*s1^2+(n2-1)*s2^2)/(n1+n2-2))

(Sorry I can't post pictures and did not find a link that would directly go to the formula)

Where 2 is the number of groups and therefore will change depending on site. I know this is used for t-test and two groups one wants to compare. In this case I'm not planning to compare these groups. My professor suggested me to use this formula to get a weighted sd. I didn't find a R function that incorporates this formula in the way I need it, therefore I tried to build my own. I am, however, new to R and not very good at making functions and loops, therefore I hope for your help.

This is what I got so far:

sd=function (data) {
nc1=data[z,"nc"]
sc1=data[z, "sc"]
nc2=data[z+1, "nc"]
sc2=data[z+1, "sc"]
sd1=(nc1-1)*sc1^2 + (nc2-1)*sc2^2
sd2=sd1/(nc1+nc2-length(nc1))
sqrt(sd2)
}

splitdf=split(df, with(df, df$keybl), drop = TRUE)

for (c in 1:length(splitdf)) {
for (i in 1:length(splitdf[[i]])) {
    a = (splitdf[[i]])
    b =sd(a)
    }
}

1) The function itself is not correct as it gives slightly lower values than it should and I don't understand why. Could it be that it does not stop when z+1 has reached the last row? If so, how can that be corrected?

2) The loop is totally wrong but it is what I could come up with after several hours of no success.

Can anybody help me?

Thanks,

Antra

解决方案

The pooled SD under the assumption of independence (so the covariance terms can be assumed to be zero) will be: sqrt( sum_over_groups[ (var)/sum(n)-N_groups)] )

     lapply( split(dat, dat$keybl), 
          function(dd) sqrt( sum( dd$sd^2 * (dd$n-1) )/(sum(dd$n-1)-nrow(dd)) ) )
#-------------------------
$Chen12ChinDesp
[1] 1.583919

$Hans11SwedDesp
[1] Inf

$Mato10GermDesp
[1] 11.0227

$Vest02DenmDesp
[1] 9.003795

$Wawa07ChinDesp
[1] 0.004123106

这篇关于如何计算R中的汇总标准差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆