计算使用awk平均值和标准偏差 [英] Compute average and standard deviation with awk

查看:680
本文介绍了计算使用awk平均值和标准偏差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个'FILE.DAT24(行)×16(列)数据。

I have a 'file.dat' with 24 (rows) x 16 (columns) data.

我已经测试,计算每列的平均日以下awk脚本。

I have already tested the following awk script that computes de average of each column.

touch aver-std.dat
awk '{   for (i=1; i<=NF; i++) { sum[i]+= $i } }
END { for (i=1; i<=NF; i++ )  
{ printf "%f \n", sum[i]/NR} }' file.dat >> aver-std.dat

输出AVER-std.dat与这些平均一列。

The output 'aver-std.dat' has one column with these averages.

同样作为平均计算
我想计算数据文件'FILE.DAT'的每一列的标准偏差,并将其在输出文件的第二列写入。
即我想与在第一列的平均值和在第二列中的标准偏差的输出文件。

Similarly as the average computation I would like to compute the standard deviation of each column of the data file 'file.dat' and write it in a second column of the output file. Namely I would like an output file with the average in the first column and the standard deviation in the second column.

我一直在不同的测试,像这样的

I have been making different tests, like this one

touch aver-std.dat
awk '{   for (i=1; i<=NF; i++) { sum[i]+= $i }}
END { for (i=1; i<=NF; i++ )  
{std[i] += ($i - sum[i])^2 ; printf "%f %f \n", sum[i]/NR, sqrt(std[i]/(NR-1))}}' file.dat >> aver-std.dat

和它在第二列写入值,但它们不是标准偏差的正确的值。偏差的计算是不正确的莫名其妙。
我想AP preciate非常多的帮助。
问候

and it writes values in the second column but they are not the correct value of the standard deviation. The computation of the deviation is not right somehow. I would appreciate very much any help. Regards

推荐答案

标准偏差

stdev = sqrt((1/N)*(sum of (value - mean)^2))

但是,它不要求预先知道的平均式的另一种形式。它是:

But there is another form of the formula which does not require you to know the mean beforehand. It is:

stdev = sqrt((1/N)*((sum of squares) - (((sum)^2)/N)))

(快速谷歌搜索公式标准差的平方之和会给你的推导,如果你有兴趣)

(A quick google search for "sum of squares" formula for standard deviation will give you the derivation if you are interested)

要使用这个公式,你需要跟踪总和及值的平方和的。所以,你的awk脚本将变为:

To use this formula, you need to keep track of both the sum and the sum of squares of the values. So your awk script will change to:

    awk '{for(i=1;i<=NF;i++) {sum[i] += $i; sumsq[i] += ($i)^2}} 
          END {for (i=1;i<=NF;i++) {
          print "%f %f \n", sum[i]/NR, sqrt((sumsq[i]-sum[i]^2/NR)/NR)}
         }' file.dat >> aver-std.dat

这篇关于计算使用awk平均值和标准偏差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆