AWK合并文件(后续) [英] AWK to Consolidate Files (Follow-Up)

查看:58
本文介绍了AWK合并文件(后续)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此帖子的后续操作: (根据注释"中的要求进行了更新)

我从 actual (伪装的)示例数据开始,然后是在该数据上运行脚本的结果.

I started over with actual (disguised) sample data, and the results of running the script on that data.

标题&目录中前两个文件的前两行.它们是相同的数据,这不是错误. (它可能在此数据集中发生.)

The headers & first two lines of the first two files in the directory. They are the same data, which is not a mistake. (It can happen in this dataset.)

Provider,,,,,,,,,,,,,,
02/01/2018 - 02/28/2018,,,,,,,,,,,,,,
Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
B002H5QR44,803814064988,803814064988,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
B002H5QRBC,803814064988,803814064988,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
B002H5QSC0,803814064988,803814064988,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
B002H5QU3M,505545471538,505545471538,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
B002H5QUAK,417248985349,417248985349,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
B002H5QUBY,417248985349,417248985349,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
B002H5QUCI,505545471538,505545471538,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
B002H5QUVO,804699101426,804699101426,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
B002H5QUZ0,804699101426,804699101426,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
B002H5QXF2,803814064988,803814064988,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
B002H5QXWK,803814064988,803814064988,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX

文件2:

Provider,,,,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,,,,
Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,,Field12,Field13,Field14,Field15
B002H5N3AA,245462777033,245462777033,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,bar1,foo2,bar1,RkG7D,BrandY
B002H5N3IM,245462777033,245462777033,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,bar1,foo2,bar1,jqiGj,BrandY
B002H5N3R8,245462777033,245462777033,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,bar1,foo2,bar1,Ylrcy,BrandY
B002H5N6X4,766193337142,766193337142,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,bar2,foo3,bar2,WHxLZ,BrandX
B002H5N756,766193337142,766193337142,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,bar2,foo3,bar2,VVw34,BrandX
B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,bar3,foo4,bar3,X48HD,BrandZ
B002H5NA3U,932053704970,932053704970,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,bar4,foo5,bar4,ghftn,BrandY
B002H5NJ6S,245675038659,245675038659,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,bar5,foo6,bar5,TVY19,BrandX
B002H5NJ6S,245675038659,245675038659,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,bar5,foo6,bar5,M2j1i,BrandX
B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,bar6,foo7,bar6,1UnN3,BrandY
B002H5OU5C,491559514618,491559514618,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,bar7,foo8,bar7,eybpO,BrandX
B002H5OU66,491559514618,491559514618,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,bar7,foo8,bar7,yS9xk,BrandX
B002H5OU6Q,491559514618,491559514618,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,bar7,foo8,bar7,0IXqS,BrandX

所需的输出:

Filename,Report_Period,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
FILENAME,02/01/2018 - 02/28/2018,B002H5QQJA,8.03814E+11,8.04E+11,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QQTU,8.03814E+11,8.04E+11,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QR44,8.03814E+11,8.04E+11,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QRBC,8.03814E+11,8.04E+11,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QSC0,8.03814E+11,8.04E+11,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QU3M,5.05545E+11,5.06E+11,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUAK,4.17249E+11,4.17E+11,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUBY,4.17249E+11,4.17E+11,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUCI,5.05545E+11,5.06E+11,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUVO,8.04699E+11,8.05E+11,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUZ0,8.04699E+11,8.05E+11,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QXF2,8.03814E+11,8.04E+11,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QXWK,8.03814E+11,8.04E+11,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N3AA,2.45463E+11,2.45E+11,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,foo2,bar1,RkG7D,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N3IM,2.45463E+11,2.45E+11,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,foo2,bar1,jqiGj,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N3R8,2.45463E+11,2.45E+11,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,foo2,bar1,Ylrcy,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N6X4,7.66193E+11,7.66E+11,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,foo3,bar2,WHxLZ,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N756,7.66193E+11,7.66E+11,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,foo3,bar2,VVw34,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,foo4,bar3,X48HD,BrandZ
FILENAME,02/01/2018 - 02/28/2018,B002H5NA3U,9.32054E+11,9.32E+11,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,foo5,bar4,ghftn,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5NJ6S,2.45675E+11,2.46E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,foo6,bar5,TVY19,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5NJ6S,2.45675E+11,2.46E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,foo6,bar5,M2j1i,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,foo7,bar6,1UnN3,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5OU5C,4.9156E+11,4.92E+11,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,foo8,bar7,eybpO,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5OU66,4.9156E+11,4.92E+11,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,foo8,bar7,yS9xk,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5OU6Q,4.9156E+11,4.92E+11,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,foo8,bar7,0IXqS,BrandX

我的脚本(改编自我O.P.中接受的答案):

几乎可以使用.但是每个文件都包含1-3行:

MY SCRIPT (Adapted from the accepted answer in my O.P.):

It almost works. But it includes lines 1-3 for every file:

gawk '
function basename(file) {
    sub(".*/", "", file)
    return file
  }
BEGIN { FS=OFS="," }
NR < 3 {
    if ( NR == 2 ) {
        hdr = "Report_Period" OFS
        val = val $1 OFS
    }
    next
}
FNR>3 {
    print "Filename", hdr $0
    next
}
{ print basename(FILENAME), val $0 }
' OFS="," /path/to/input/files/*.csv > ~/path/to/output/file/SampleOutput.csv

实际输出

这是结果文件的全部内容.问题似乎是标题在重复:

ACTUAL OUTPUT

This is the entire contents of the result file. The problem seems to be that the headers are repeating:

Sample1.csv,02/01/2018 - 02/28/2018,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
Filename,Report_Period,B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
Filename,Report_Period,B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
Filename,Report_Period,B002H5QR44,803814064988,803814064988,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
Filename,Report_Period,B002H5QRBC,803814064988,803814064988,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
Filename,Report_Period,B002H5QSC0,803814064988,803814064988,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
Filename,Report_Period,B002H5QU3M,505545471538,505545471538,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
Filename,Report_Period,B002H5QUAK,417248985349,417248985349,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
Filename,Report_Period,B002H5QUBY,417248985349,417248985349,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
Filename,Report_Period,B002H5QUCI,505545471538,505545471538,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
Filename,Report_Period,B002H5QUVO,804699101426,804699101426,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
Filename,Report_Period,B002H5QUZ0,804699101426,804699101426,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
Filename,Report_Period,B002H5QXF2,803814064988,803814064988,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
Filename,Report_Period,B002H5QXWK,803814064988,803814064988,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX
Sample2.csv,02/01/2018 - 02/28/2018,Provider,,,,,,,,,,,,,,
Sample2.csv,02/01/2018 - 02/28/2018,01/01/2018 - 01/31/2018,,,,,,,,,,,,,,
Sample2.csv,02/01/2018 - 02/28/2018,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
Filename,Report_Period,B002H5N3AA,2.45463E+11,2.45463E+11,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,foo2,bar1,RkG7D,BrandY
Filename,Report_Period,B002H5N3IM,2.45463E+11,2.45463E+11,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,foo2,bar1,jqiGj,BrandY
Filename,Report_Period,B002H5N3R8,2.45463E+11,2.45463E+11,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,foo2,bar1,Ylrcy,BrandY
Filename,Report_Period,B002H5N6X4,7.66193E+11,7.66193E+11,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,foo3,bar2,WHxLZ,BrandX
Filename,Report_Period,B002H5N756,7.66193E+11,7.66193E+11,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,foo3,bar2,VVw34,BrandX
Filename,Report_Period,B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,foo4,bar3,X48HD,BrandZ
Filename,Report_Period,B002H5NA3U,9.32054E+11,9.32054E+11,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,foo5,bar4,ghftn,BrandY
Filename,Report_Period,B002H5NJ6S,2.45675E+11,2.45675E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,foo6,bar5,TVY19,BrandX
Filename,Report_Period,B002H5NJ6S,2.45675E+11,2.45675E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,foo6,bar5,M2j1i,BrandX
Filename,Report_Period,B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,foo7,bar6,1UnN3,BrandY
Filename,Report_Period,B002H5OU5C,4.9156E+11,4.9156E+11,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,foo8,bar7,eybpO,BrandX
Filename,Report_Period,B002H5OU66,4.9156E+11,4.9156E+11,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,foo8,bar7,yS9xk,BrandX
Filename,Report_Period,B002H5OU6Q,4.9156E+11,4.9156E+11,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,foo8,bar7,0IXqS,BrandX

再次感谢!

推荐答案

尝试两个.忘记包含标题.

Try two. Forgot to include the header.

(pi 580) $ cat /tmp/x.sh
#!/bin/sh

gawk '
  BEGIN {FS=OFS=","}
  FNR == 1 {file=FILENAME; sub(".*/", "", file); next}
  FNR == 2 {period=$1; next}
  NR  == 3 {print "file","period",$0; next}
  FNR == 3 {next}
  {print file,period,$0}
' $*

(pi 581) $ /tmp/x.sh /tmp/f?.*
file,period,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
f1.txt,02/01/2018 - 02/28/2018,B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
f1.txt,02/01/2018 - 02/28/2018,B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
f2.txt,01/01/2018 - 01/31/2018,B002H5N3AA,245462777033,245462777033,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,bar1,foo2,bar1,RkG7D,BrandY
f2.txt,01/01/2018 - 01/31/2018,B002H5N3IM,245462777033,245462777033,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,bar1,foo2,bar1,jqiGj,BrandY

这篇关于AWK合并文件(后续)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆