基于列标题的文件中的AWK参考列 [英] AWK reference columns from file based on header of the columns

查看:68
本文介绍了基于列标题的文件中的AWK参考列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在cmd.awk中有以下代码:

I have the following code in cmd.awk:

 BEGIN {FS=","}
 {
    if(FNR==1) print $0",Header";
    else if (FNR>1)
            {
                    if($79==0 && $80==0 && $81==0) print $0",0";
                    else if ($80==0 && $81!=0) print $0","($79-$81)/$81;
                    else if ($81==0 && $80!=0) print $0","($79-$80)/$80;
                    else if ($81==0 && $80==0 && $79!=0) print $0",10";
                    else if ($81!=0 && $80!=0) print $0","(($79-$80)/$80)+(($80-$81)/$81);
            }
}

当我执行以下命令时:

awk -f cmd.awk input.txt

它执行所需的操作(在AWK脚本中指定)并提供所需的结果.

it performs the required operation(as specified in the AWK Script) and provides the required result.

但是在此脚本中,输入txt文件的所有列都基于column_index进行访问,即$ 79,$ 80,$ 81等.

But in this script all the columns of the input txt file are being accessed based on the column_index i.e., $79, $80, $81 etc.

我的要求是,我需要将此脚本用作一个函数,该函数将$ 79,$ 80,$ 81和Header(如脚本中给出的)作为参数,执行操作并将结果存储在新添加的列中,列名称为Header和将新内容存储到输出文件中.但是我只能以列标题的形式而不是列索引的形式指定参数,即我的函数调用必须是这样的:

My requirement is that i need to use this script as a function which takes $79, $80, $81 and Header(as given in the script) as parameters, performs operations and stores result in the newly appended column with column name Header and store the new contents into an output file. But i am only allowed to specify the parameters in the form of column headers and not in column index i.e., my function call has to be something like this:

cmd(column_header1, column_header2, column_header3,new_header)

和cmd()的函数定义必须执行上面awk脚本中提到的操作.

and the function definition of cmd() has to perform the operation mentioned in the awk script above.

有没有办法做到这一点?请记住,我是awk的新手.预先感谢.

Is there any way to do this? Please bear in mind that I'm very new to awk. Thanks in advance.

我的输入文件包含150列和超过5000万行.该文件的示例如下:

My input file contains 150 columns and over 50M rows. A sample of the file is given below:

RN,DATE,ID,PRE_M1,PRE_M2,GALV,GALG,PRE_M5.........................TOTAL
0624873840,2016/04/28,201610,1618,0,0,0,Active,.................12234
0747269250,2016/02/02,201610,227,93,0,0,Daat,....................99988

输入文件包含类型为numeric,character的列.在上面的AWK脚本中访问的列都是数字类型.

The input file contains columns of type numeric,character. The columns being accessed in the above AWK script are all of type numeric.

所需输出文件的示例如下:

A sample of the required output file is as below:

RN,DATE,ID,PRE_M1,PRE_M2,GALV,GALG,PRE_M5.........................TOTAL,Header
0624873840,2016/04/28,201610,1618,0,0,0,Active,.................12234,10
0747269250,2016/02/02,201610,227,93,0,0,Daat,....................99988,0

请注意,新列将被添加到名称为"Header"的文件中,并且该列包含输入文件每一行的AWK脚本结果.

Please note that a new column is being appended to the file with name "Header" and this column contains the result of the AWK script for each individual row of the input file.

推荐答案

我认为您可以简化很多操作,因为没有输入文件,所以盲目地...

I think you can simplify it a lot, there is no input file so flying blind...

假设感兴趣的列是连续的并且字段都是数字,请仅提供起始地址

Assuming the columns in interest are consecutive and fields are all numerical, just provide the start address

$ awk -F, -v s=79 'BEGIN {OFS=FS}
                   NR==1 {$(NF+1)="Header"}
                   NR >1 {v1=$s; v2=$(s+1); v3=$(s+2)
                          if(!v2 && !v3) $(NF+1) = v1?10:0
                          else $(NF+1) = v3?(v1-v3)/v3:0 + v2?(v1-v2)/v2:0}1' file

参数列名称可以写为

$ cols="c1,c2,c3"; header="Header"
$ awk -F, -v cols="$cols" -v hdr="$header" '
           BEGIN {OFS=FS}
           NR==1 {n=split(cols,cn); 
                  for(i=1;i<=NF;i++) 
                    for(j=1;j<=n;j++) 
                      if($i==cn[j]) c[++k]=i; 
                  $(NF+1)=hdr}
           NR >1 {v1=$c[1]; v2=$c[2]; v3=$c[3]
                  if(!v2 && !v3) $(NF+1) = v1?10:0
                  else $(NF+1) = v3?(v1-v3)/v3:0 + v2?(v1-v2)/v2:0}1' file

id,c1,c2,c3,Header
1,0,0,0,0
2,0,0,1,-1
3,0,1,0,-1
4,0,1,1,-1
5,1,0,0,10
6,1,0,1,0
7,1,1,0,0
8,1,1,1,0

对于给定的输入文件

id,c1,c2,c3
1,0,0,0
2,0,0,1
3,0,1,0
4,0,1,1
5,1,0,0
6,1,0,1
7,1,1,0
8,1,1,1

说明

n=split(cols,cn)使用相同的FS分隔符将字符串"cols"拆分为数组"cn".将返回元素数并将其分配给"n".

n=split(cols,cn) splits the string "cols" into array "cn" using the same FS delimiter. The number of elements will be returned and assigned to "n".

1{print}

这篇关于基于列标题的文件中的AWK参考列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆