计算平均每列忽略使用awk缺失数据的 [英] Calculate mean of each column ignoring missing data with awk

查看：216 发布时间：2016/7/28 15:07:21 linux bash awk mean missing-data

本文介绍了计算平均每列忽略使用awk缺失数据的的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有数千行和几十列的大型制表符分隔的数据表，它已经失踪标记为不适用的数据。例如，

  NA NA 0.93 NA 0 0.51
1 1 1 NA NA 1
1 NA NA 0.97 1
0.92 NA 1 1 0.01 0.34

我想计算每一列的平均值，但在确认丢失的数据在计算中忽略。例如，第1列的平均值应是0.97。我相信我可以用 AWK ，但我不知道如何构建命令为缺失数据的所有列和帐户做到这一点。

我只知道怎么做是计算单个列的意思，但它把丢失的数据为0而不是让出来的计算。

 的awk'{总和+ = $ 1} END {打印总和/ NR}'文件名

解决方案

这是模糊的，但适用于您的例子

 的awk'{为（i = 1; I＆LT; = NF;我++）{总和[I] + = $ I;如果（！$ I =NA）{算[I] + = 1}}} END {为（i = 1; I＆LT; = NF;我++）{如果（！算上[I] = 0）{V =总和[I] /计数[I]}其他{v = 0};如果（I＆LT; NF）{printf的％F \\ t的，V}其他{打印V}}}'input.txt中

编辑：
这里是它如何工作的：

 的awk'{为（i = 1; I＆LT; = NF;我++）{#for每列
        综上所述[I] + = $ I; ＃将总和的总和阵
        如果（$ I！=NA）{＃如果值不是NA
           算上[I] + = 1} #increment列计数
        }                   ＃万一
     } #endfor
    END {#at结束
     对于（i = 1; I＆LT; = NF;我++）{#for每列
        如果（计数由[i]！= 0）{＃如果列计数不为0
            V =总和[I] /计数[I] #then计算列是什么意思（这里重新以Vpsented $ P $）
        }其他{的#else（如果列数为0）
            V = 0 #then让的意思是0（注：你可以设置这是NA）
        };的#endif山口计数不为0
        如果（ⅰ＆下; NF）{＃如果该列是最后一列前
            printf的％F \\ t的，V #PRINT意思+ TAB
        }否则{的#else（如果它是最后一列）
            打印V} #PRINT平均值+ NEWLINE
        }; ＃万一
     }'input.txt的#endfor（注：input.txt的是输入文件）

```

I have a large tab-separated data table with thousands of rows and dozens of columns and it has missing data marked as "na". For example,

na  0.93    na  0   na  0.51
1   1   na  1   na  1
1   1   na  0.97    na  1
0.92    1   na  1   0.01    0.34

I would like to calculate the mean of each column, but making sure that the missing data are ignored in the calculation. For example, the mean of column 1 should be 0.97. I believe I could use awk but I am not sure how to construct the command to do this for all columns and account for missing data.

All I know how to do is to calculate the mean of a single column but it treats the missing data as 0 rather than leaving it out of the calculation.

awk '{sum+=$1} END {print sum/NR}' filename

解决方案

This is obscure, but works for your example

awk '{for(i=1; i<=NF; i++){sum[i] += $i; if($i != "na"){count[i]+=1}}} END {for(i=1; i<=NF; i++){if(count[i]!=0){v = sum[i]/count[i]}else{v = 0}; if(i<NF){printf "%f\t",v}else{print v}}}' input.txt

EDIT: Here is how it works:

awk '{for(i=1; i<=NF; i++){ #for each column
        sum[i] += $i;       #add the sum to the "sum" array
        if($i != "na"){     #if value is not "na"
           count[i]+=1}     #increment the column "count"
        }                   #endif
     }                      #endfor
    END {                    #at the end
     for(i=1; i<=NF; i++){  #for each column
        if(count[i]!=0){        #if the column count is not 0
            v = sum[i]/count[i] #then calculate the column mean (here represented with "v")
        }else{                  #else (if column count is 0)
            v = 0               #then let mean be 0 (note: you can set this to be "na")
        };                      #endif col count is not 0
        if(i<NF){               #if the column is before the last column
            printf "%f\t",v     #print mean + TAB
        }else{                  #else (if it is the last column)
            print v}            #print mean + NEWLINE
        };                      #endif
     }' input.txt               #endfor (note: input.txt is the input file)

```

这篇关于计算平均每列忽略使用awk缺失数据的的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算平均每列忽略使用awk缺失数据的 [英] Calculate mean of each column ignoring missing data with awk

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

计算平均每列忽略使用awk缺失数据的 [英] Calculate mean of each column ignoring missing data with awk

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭