格式化,然后使用shell脚本和awk将txt转换为csv [英] Format and then convert txt to csv using shell script and awk

查看:341
本文介绍了格式化,然后使用shell脚本和awk将txt转换为csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件:

ifile.txt
x       y       z       t              value
1       1       5       01hr01Jan2018   3
1       1       5       02hr01Jan2018   3.1
1       1       5       03hr01Jan2018   3.2
1       3.4     3       01hr01Jan2018   4.1
1       3.4     3       02hr01Jan2018   6.1
1       3.4     3       03hr01Jan2018   1.1
1       4.2     6       01hr01Jan2018   6.33
1       4.2     6       02hr01Jan2018   8.33
1       4.2     6       03hr01Jan2018   5.33
3.4     1       2       01hr01Jan2018   3.5
3.4     1       2       02hr01Jan2018   5.65
3.4     1       2       03hr01Jan2018   3.66
3.4     3.4     4       01hr01Jan2018   6.32
3.4     3.4     4       02hr01Jan2018   9.32
3.4     3.4     4       03hr01Jan2018   12.32
3.4     4.2     8.1     01hr01Jan2018   7.43
3.4     4.2     8.1     02hr01Jan2018   7.93
3.4     4.2     8.1     03hr01Jan2018   5.43
4.2     1       3.4     01hr01Jan2018   6.12
4.2     1       3.4     02hr01Jan2018   7.15
4.2     1       3.4     03hr01Jan2018   9.12
4.2     3.4     5.5     01hr01Jan2018   2.2
4.2     3.4     5.5     02hr01Jan2018   3.42
4.2     3.4     5.5     03hr01Jan2018   3.21
4.2     4.2     6.2     01hr01Jan2018   1.3
4.2     4.2     6.2     02hr01Jan2018   3.4
4.2     4.2     6.2     03hr01Jan2018   1

说明:每个坐标(x,y)都有一个z值和三个时间值.空格不是制表符.它们是空格序列.

Explanation: Each coordinate (x,y) has a z-value and three time values. The spaces are not tabs. They are sequence of spaces.

我想将t列格式化为行,然后转换为csv文件.我的预期输出是:

I would like to format the t-column as row and then convert to a csv file. My expected output is as:

ofile.txt
x,y,z,01hr01Jan2018,02hr01Jan2018,03hr01Jan2018
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

我正在按照以下方式进行尝试,但仍未获得期望的输出.我的脚本在末尾打印了一些多余的逗号(,).

I am trying it in following way, but still not getting the desire output. My script prints some extra commas (,) at the end.

我的算法和脚本是:

    #Step1:- Split into two files: one with x,y,z (0001.txt) and
    #        another with t,value (0002.txt).

    awk '{n=3; for (i=1;i<=n;i++) printf "%s ", $i; print "";}' ifile.txt > 0001.txt
    awk '{n=5; for (i=4;i<=n;i++) printf "%s ", $i; print "";}' ifile.txt > 0002.txt

    #Setp2:- In 0001.txt: Delete the repetition rows. 

    awk '!seen[$1,$2,$3]++' 0001.txt > 00011.txt

    #Step3:- In 0002.txt: Delete the first row. For each 3 rows in t-column,
    #        write the value-column as rows. Add the t-row at top
    #        this is very manual. I am wondering for some command

    grep -E "^[0-9].*" 0002.txt > 0003.txt
   awk -v n=3 '{ row = row $2 " "; if (NR % n == 0) { print row; row = "" } }' 0003.txt > 0004.txt
    (echo "01hr01Jan2018,02hr01Jan2018,03hr01Jan2018";cat 0004.txt) > 00022.txt  

    #Step4:- Paste output of two and convert to csv.
    paste 00011.txt 00022.txt > 0005.txt
    cat 0005.txt | tr -s '[:blank:]' ',' > ofile.txt

推荐答案

您可以使用以下awk:

awk -v OFS=, '{k=$1 OFS $2 OFS $3}
!($4 in hdr){hn[++h]=$4; hdr[$4]}
k in row{row[k]=row[k] OFS $5; next}
{rn[++n]=k; row[k]=$5}
END {
   printf "%s", rn[1]
   for(i=1; i<=h; i++)
      printf "%s", OFS hn[i]
   print ""
   for (i=2; i<=n; i++)
      print rn[i], row[rn[i]]
}' file

x,y,z,t,01hr01Jan2018,02hr01Jan2018,03hr01Jan2018
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

这篇关于格式化,然后使用shell脚本和awk将txt转换为csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆