具有不同行大小的多个文件的平均值 [英] Average of multiple files having different row sizes

查看：64 发布时间：2020/9/15 3:22:08 linux shell awk average

本文介绍了具有不同行大小的多个文件的平均值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有几个行大小不同的文件，但是每个文件中的列数是相同的.例如

I have few files with different row sizes, but number of columns in each file is same. e.g.

ifile1.txt

1       1001    ?       ?
2       1002    ?       ?
3       1003    ?       ?
4       1004    ?       ?
5       1005    ?       0
6       1006    ?       1
7       1007    ?       3
8       1008    5       4
9       1009    3       11
10      1010    2       9

ifile2.txt

1       2001    ?       ?
2       2002    ?       ?
3       2003    ?       ?
4       2004    ?       ?
5       2005    ?       0
6       2006    6       12
7       2007    6       5
8       2008    9       10
9       2009    3       12
10      2010    5       7
11      2011    2       ?
12      2012    9       ?

ifile3.txt

1       3001    ?       ?
2       3002    ?       6
3       3003    ?       ?
4       3004    ?       ?
5       3005    ?       0
6       3006    1       25
7       3007    2       3
8       3008    ?       ?

在每个文件中，第一列代表索引号，第二列代表ID. 我想从第三列开始计算每个索引号的平均值.

In each file 1st column represents the index number and 2nd column as ID. I would like to calculate the average for each index number from 3rd column onward.

所需的输出:

1       ?       ?      ----  [Here ? is computed from ?, ?, ?] So answer is ?
2       ?       6.0    ----  [Here 6 is computed from ?, ?, 6] So answer is 6/1=6.0
3       ?       ?
4       ?       ?
5       ?       0.0
6       3.5     12.7 
7       4.0     3.7
8       7.0     7.0    ----- [Here 7 is computed from 5, 9, ? ] So answer is 14/2=7.0   
9       3.0     11.5
10      3.5     8.0
11      2.0     ?
12      9.0     ?

推荐答案

您可以解析文件并将每个位置的总和和计数存储在某种二维数组中，该数组对于awk确实不存在，但是可以可以使用适当的索引字符串实现，请参见: https://www .gnu.org/software/gawk/manual/html_node/MultiDimension.html

You could parse the files and store sum and count of each position in some kind of 2-dimensional array, which doesn't really exist for awk, but can be implemented using the appropriate index string, see also: https://www.gnu.org/software/gawk/manual/html_node/Multidimensional.html

这是一个使用示例输入和输出测试的脚本.

Here is a script tested with your sample input and output.

{
    c = NF
    if (r<FNR) r = FNR
    
    for (i=3;i<=NF;i++) {
        if ($i != "?") {
            s[FNR "," i] += $i
            n[FNR "," i] += 1
        }
    }
}

END {
    for (i=1;i<=r;i++) {
        printf("%s\t", i)
        for (j=3;j<=c;j++) {
            if (n[i "," j]) {
                printf("%.1f\t", s[i "," j]/n[i "," j])
            } else {
                printf("?\t")
            }
        }
        printf("\n")
    }
}

测试

> awk -f test.awk file1 file2 file3
1       ?       ?
2       ?       6.0  
3       ?       ?
4       ?       ?
5       ?       0.0
6       3.5     12.7
7       4.0     3.7
8       7.0     7.0
9       3.0     11.5
10      3.5     8.0
11      2.0     ?
12      9.0     ?

这篇关于具有不同行大小的多个文件的平均值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

具有不同行大小的多个文件的平均值 [英] Average of multiple files having different row sizes

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

具有不同行大小的多个文件的平均值 [英] Average of multiple files having different row sizes

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭