如何计算在我的shell脚本的标准偏差? [英] How do I calculate the standard deviation in my shell script?

查看:409
本文介绍了如何计算在我的shell脚本的标准偏差?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个shell脚本:

I have a shell script:

dir=$1 
cd $dir 
grep -P -o '(?<=<rating>).*' * | 
awk -F: '{A[$1]+=$2;L[$1]++;next}END
{for(i in A){print i, A[i]/L[i]}}' | sort -nr -k2 | 
awk '{ sub(/.dat/, " "); print }'

这概括了所有按照数字的&LT;评价&gt;在我的文件夹中的每个文件字段,但现在我需要计算数字的标准偏差,而比获得的平均水平。通过总结各等级的差在该文件中,从平均的平方,然后除以这由样本大小-1。我并不需要这样做在文件夹中的每个文件,而是在2特定的文件, hotel_188937.dat hotel_203921.dat 。下面是这些文件之一的内容的一个例子:

which sums up all of the numbers that follow the <rating> field in each file of my folder but now I need to calculate the standard deviation of the numbers rather than getting the average. By summing up the difference of each rating in the file from the mean squared and then dividing this by the sample size -1. I do not need to do this in every file in the folder, but instead in 2 specific files, hotel_188937.dat and hotel_203921.dat. Here is an example of the contents of one of these files:

<Overall Rating>
<Avg. Price>$155
<URL>

<Author>Jeter5
<Content>I hope we're not disappointed! We enjoyed New Orleans...
<Date>Dec 19, 2008
<No. Reader>-1
<No. Helpful>-1
<rating>4
<Value>-1
<Rooms>3
<Location>5
<Cleanliness>3
<Check in / front desk>5
<Service>5
<Business service>5

<Author>...
repeat fields again...

的第一个文件的样本尺寸为127与324的样品尺寸和4.78用于第二文件的平均比较的4.78的平均。反正是有,我可以改变我的脚本来计算这两个特定文件的标准偏差,而不是计算平均在我的目录中的所有文件?感谢您的时间。

The sample size of the first file is 127 with a mean of 4.78 compared with a sample size of 324 and a mean of 4.78 for the second file. Is there anyway that I can alter my script to calculate the standard deviation for these two specific files rather than calculating the average for every file in my directory? Thanks for your time.

推荐答案

您可以做所有在一个awk脚本

You can do all in one awk script

$ awk -F'>' '
    $1=="<rating" {k=FILENAME;sub(/.dat/,"",k);
                   s[k]+=$2;ss[k]+=$2^2;c[k]++}
               END{for(i in s) 
                   print i,m=s[i]/c[i],sqrt(ss[i]/c[i]-m^2)}' r1.dat r2.dat

r1 2.5 1.11803
r2 3 1.41421

s是总和,SS的平方和,C为计数,男的意思。请注意,这个计算总体标准差不样本标准差。对于后者,你需要做的(计数1)。

s is for sum, ss for square sum, c for count, m for mean. Note that this computes population standard deviation not sample standard deviation. For latter you need to do some scaling adjustments with (count-1).

这篇关于如何计算在我的shell脚本的标准偏差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆