获得每条线的平均值 [英] Getting average per line

查看:101
本文介绍了获得每条线的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这种格式的大数据集

I have a large data set in this format

HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87

我想计算每行的平均值,从第5列开始到行尾,而忽略字符串NA.然后将平均值附加到每行的末尾.

I want to calculate an average for each line, starting from column 5 until end of line, and ignoring the string NA. Then append the average to the end of each line.

输出看起来像这样:

HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87 0.775
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87 0.620

我一直在这样求和,但无法弄清楚如何跟踪求和的整数数,以便计算平均值.

I have been getting the sum like this, but can't figure out how to keep track of the number of integers that were summed, in order to calculate the average.

awk '{x=0;for(i=5;i<=NF;i++)x=x+$i;print $0, x}'

推荐答案

$ cat file
HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87
HF TLLM A T NA NA NA NA NA NA NA

$ awk '{sum=cnt=0; for (i=5;i<=NF;i++) if ($i != "NA") { sum+=$i; cnt++ } print $0, (cnt ? sum/cnt : "NA") }' file
HF TLLL A T 0.999 NA 0.666 NA 0.566 NA NA 0.87 0.77525
HF TLLM A T 0.500 0.500 0.666 0.566 NA NA 0.87 0.6204
HF TLLM A T NA NA NA NA NA NA NA NA

三元表达式可避免在每个数据字段均为"NA"的输入行3上除以零错误.

The ternary expression avoids a divide by zero error on input row 3 where every data field is "NA".

这篇关于获得每条线的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆