awk搜索并计算标准差不同的结果 [英] awk search and calculate standard deviation different results

查看：79 发布时间：2021/5/9 20:52:02 bash math awk standard-deviation

本文介绍了awk搜索并计算标准差不同的结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在努力获取sar的输出并计算列的标准偏差.我可以使用文件中的单个列成功执行此操作.但是，当我在文件中删除标题行和avg行之类的坏"行时，计算出同一列时，它给了我一个不同的值.

I am working to take the output of sar and calculate the standard deviation of a column. I can perform this successfully with a single column in a file. However when I calculate this same column in a file where I am stripping out the 'bad' lines like the title lines and avg lines, it is giving me a different value.

以下是我正在执行此操作的文件:

Here are the files I am performing this on:

/tmp/saru.tmp

# cat /tmp/saru.tmp
Linux 2.6.32-279.el6.x86_64 (progserver)        09/06/2012      _x86_64_        (4 CPU)

11:09:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
11:10:01 PM     all      0.01      0.00      0.05      0.01      0.00     99.93
11:11:01 PM     all      0.01      0.00      0.06      0.00      0.00     99.92
11:12:01 PM     all      0.01      0.00      0.05      0.01      0.00     99.93
11:13:01 PM     all      0.01      0.00      0.05      0.00      0.00     99.93
11:14:01 PM     all      0.01      0.00      0.04      0.00      0.00     99.95
11:15:01 PM     all      0.01      0.00      0.06      0.00      0.00     99.92
11:16:01 PM     all      0.01      0.00      2.64      0.01      0.01     97.33
11:17:01 PM     all      0.02      0.00     21.96      0.00      0.08     77.94
11:18:01 PM     all      0.02      0.00     21.99      0.00      0.08     77.91
11:19:01 PM     all      0.02      0.00     22.10      0.00      0.09     77.78
11:20:01 PM     all      0.02      0.00     22.06      0.00      0.09     77.83
11:21:01 PM     all      0.02      0.00     22.10      0.03      0.11     77.75
11:22:01 PM     all      0.01      0.00     21.94      0.00      0.09     77.95
11:23:01 PM     all      0.02      0.00     22.15      0.00      0.10     77.73
11:24:01 PM     all      0.02      0.00     22.02      0.00      0.09     77.87
11:25:01 PM     all      0.02      0.00     22.03      0.00      0.13     77.82
11:26:01 PM     all      0.02      0.00     21.96      0.01      0.14     77.86
11:27:01 PM     all      0.02      0.00     22.00      0.00      0.09     77.89
11:28:01 PM     all      0.02      0.00     21.91      0.00      0.09     77.98
11:29:01 PM     all      0.03      0.00     22.02      0.02      0.08     77.85
11:30:01 PM     all      0.14      0.00     22.23      0.01      0.13     77.48
11:31:01 PM     all      0.02      0.00     22.26      0.00      0.16     77.56
11:32:01 PM     all      0.03      0.00     22.04      0.01      0.10     77.83
Average:        all      0.02      0.00     15.29      0.01      0.07     84.61

/tmp/sarustriped.tmp

# cat /tmp/sarustriped.tmp                              
0.05
0.06
0.05
0.05
0.04
0.06
2.64
21.96
21.99
22.10
22.06
22.10
21.94
22.15
22.02
22.03
21.96
22.00
21.91
22.02
22.23
22.26
22.04

基于/tmp/saru.tmp的计算:

The Calculation based on /tmp/saru.tmp:

# awk  '$1~/^[01]/ && $6~/^[0-9]/ {sum+=$6; array[NR]=$6} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))**2);}print sqrt(sumsq/NR)}' /tmp/saru.tmp
10.7126

基于/tmp/sarustriped.tmp的计算(正确的计算)

The Calculation based on /tmp/sarustriped.tmp ( the correct one )

# awk '{sum+=$1; array[NR]=$1} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))**2);}print sqrt(sumsq/NR)}' /tmp/sarustriped.tmp
9.96397

有人可以帮助我告诉我为什么这些结果有所不同吗?有没有办法通过单个awk命令来获得更正的结果.我试图这样做是为了提高性能，因此最好不要使用单独的命令，例如grep或其他awk命令.

Could someone assist and tell me why these results are different and is there a way to get the corrected results with a single awk command. I am trying to do this for performance so not using a separate command like grep or another awk command is preferable.

谢谢！

所以我尝试了这个...

so I tried this ...

awk  '
  $1~/^[01]/ && $6~/^[0-9]/ {
    numrec += 1
    sum    += $6
    array[numrec] = $6
  } 
  END {
    for(x=1; x<=numrec; x++)
      sumsq += ((array[x]-(sum/numrec))^2)
    print sqrt(sumsq/numrec)
  }
' saru.tmp

，它对于我正在使用的sar -u输出正常工作.我不明白为什么它不能与其他列表"一起使用.简而言之，尝试使用sar -r第5列.它再次给出错误的答案...输出给出1.68891，但实际偏差为.107374 ...这是与sar -u相同的命令.....如果您需要我可以提供的文件.只是不确定如何发表新的完整"评论...所以我只是编辑了旧评论...谢谢！

and it works correctly for the sar -u output I was working with. I do not see why it would not work with other 'lists'. To make it short, trying to work with sar -r column 5. it is giving a wrong answer again... Output is giving 1.68891 but actual deviation is .107374... this is the same command that worked with sar -u..... if you need files I can provide. Just not sure how to make a new 'full' comment... so i just edited the old one...thanks!

awk搜索并计算标准差不同的结果 [英] awk search and calculate standard deviation different results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

awk搜索并计算标准差不同的结果 [英] awk search and calculate standard deviation different results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭