awk或shell脚本来更改制表符分隔文件的格式 [英] awk or shell script to change format of a tab delimited file

查看:487
本文介绍了awk或shell脚本来更改制表符分隔文件的格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要如下所述将制表符分隔的数据的格式从输入更改为输出格式,请帮助我编写脚本.

I need to change the format of my tab delimited data from the input to output format as mentioned below, kindly help me to write a script.

输入文件:

BRANCH_CODE DEPT_CODE   ITEM_CODE   UNIT_CODE   01/04/2017  02/04/2017  03/04/2017  04/04/2017  05/04/2017  06/04/2017  07/04/2017  08/04/2017  09/04/2017  10/04/2017    
KI-01   DP-0001 10001   KG  31.5    45  72  84  67.5    39  57  22.5    22  56    
KI-01   DP-0001 10002   KG  22  0   62  18  49  13  75  17  0   72

输出格式:

DOC_DATE    BRANCH_CODE DEPT_CODE   ITEM_CODE   UNIT_CODE   QTY     
01/04/2017  KI-01   DP-0001 10001   KG  31.5
01/04/2017  KI-01   DP-0001 10002   KG  22
02/04/2017  KI-01   DP-0001 10001   KG  45
02/04/2017  KI-01   DP-0001 10002   KG  0
03/04/2017  KI-01   DP-0001 10001   KG  72
03/04/2017  KI-01   DP-0001 10002   KG  62

以此类推

我正在 .sh 文件中编写这样的代码.

I was writing a code like this in a .sh file.

#!/bin/bash
awk 'NR!=1{print $0}' input.tsv > temp_data_wo_header.tsv;
lc=$(wc -l < temp_data_wo_header.tsv);
for ((i=6; i<=15; i++))
do
    echo "Constructing date file "$i" and ...";
    (for (( c=1; c<=$lc; c++));
        do 
            awk 'NR==1{print $'$i'}' input.tsv;
        done
    ) > temp_date.tsv;
    echo "Adding date to data file...";
    paste <(awk '{print $1}' temp_date.tsv ) <(awk 'BEGIN { FS = "\t" } ; {print $1,$2,$3,$5,$'$i'}' temp_data_wo_header.tsv ) > "temp_day_"$i"_data.tsv";
    echo "Finished adding...";
done;

还有其他方法可以用更好的代码来实现它.

is there any other way to do it in a better code.

推荐答案

自2D数组以来,这是GNU awk中的一个:

Here's one in GNU awk since using 2D arrays:

$ awk '
BEGIN {
    FS=OFS="\t" }                          # set the delimiters
{
    sub(/\r/,"",$NF)                       # in case of \r\n line endings
    a[NR][1]                               # define array element
    n=split($0,a[NR],FS)                   # split record to a[NR]
    a[NR][4]=$1 OFS $2 OFS $3 OFS $4       # gather constants to one element
    if(NR==1)
        a[NR][4]="DOC_DATE" OFS a[NR][4] OFS "QTY"
}
END {                                      # everything is in memory
    print a[1][4];                         # header print
    for(j=5;j<=n;j++)                      # loop all data fields
        for(i=2;i<=NR;i++)                 # loop all records
            print a[1][j],a[i][4],a[i][j]  # output
}' file
DOC_DATE        BRANCH_CODE     DEPT_CODE       ITEM_CODE       UNIT_CODE       QTY
01/04/2017      KI-01   DP-0001 10001   KG      31.5
01/04/2017      KI-01   DP-0001 10002   KG      22
02/04/2017      KI-01   DP-0001 10001   KG      45
02/04/2017      KI-01   DP-0001 10002   KG      0
03/04/2017      KI-01   DP-0001 10001   KG      72

这篇关于awk或shell脚本来更改制表符分隔文件的格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆