AWK合并文件 [英] AWK to Consolidate Files

查看:105
本文介绍了AWK合并文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在破解一些AWK.我是初学者.我已经完成了以下问题的作业,但无法使其正常工作.

I'm hacking some AWK. I'm a beginner with it. I have done my homework on the following problem, and just can't get it to work.

Start Date  12/3/17
End Date    12/30/17
Report Type Report1
Currency    ZAR
Country Identifier  MType   Quantity    Net Net Net Code    Title   Contrib I_Type  M_Type  Vendor Identifier   Offline Indicator   LSN
ZA  44057330    FMP 1   0.050666    0.050666    USYYYYYYYYYY    ABC Tom 1   1   USYYYYYYYYYY    0   SUT
ZA  1267456726  SIMT    1   0.03    0.03    USXXXXXXXXXX    DEF Frances 1   1   USXXXXXXXXXX    0   XYZ
Row Count   657
Storefront Name MType   Quantity    Net Net
ZA  FMP 601 30.45
ZA  IAP 13  0.68
ZA  IMP 1035    69.36
ZA  SIMP    54  1.4
ZA  FMT 70  0.53
ZA  IMT 92  1.68
ZA  SIMT    6   0.18

所需的输出:

(我在这里未转义特殊字符.)

DESIRED OUTPUT:

(I left the special characters un-escaped here.)

"Filename"  "Start Date"    "End Date"  "Currency"  "Country"   "Identifier"    "MType" "Quantity"  "Net"   "NetNet"    "Code"  "Title" "Contrib"   "I_Type"    "M_Type"    "Vendor Identifier" "Offline Indicator" "LSN"
"rawfile.txt"   "12/3/17"   "12/30/17"  "ZAR"   "ZA"    "44057330"  "FMP"   "1" "0.050666"  "0.050666"  "USYYYYYYYYYY"  "ABC"   "Tom"   "1" "1" "USYYYYYYYYYY"  "0" "SUT"
"rawfile.txt"   "12/3/17"   "12/30/17"  "ZAR"   "ZA"    "1267456726"    "SIMT"  "1" "0.03"  "0.03"  "USXXXXXXXXXX"  "DEF"   "Frances"   "1" "1" "USXXXXXXXXXX"  "0" "XYZ"

基本上,我只需要从第5行中获取大部分标题,但是我需要的三个字段位于1-4行中.另外,我不需要包含以行数"开头的行及其后的数据.

Basically I just need to get most of the header from line 5, but three fields I need are in lines 1-4. Also, I don't need the data including and after the line that starts with "Row Count".

gawk '
function basename(file) {
    sub(".*/", "", file)
    return file
  }
  /^Row Count/ {nextfile}
  FNR == 1 { StartDate=$2; }
  FNR == 2 { EndDate=$2; }
  FNR == 4 { curr=$2; }
  NR == 5 {$0 = "StartDate" OFS "EndDate" OFS "Filename" OFS "curr" OFS $0; print} 
  FNR > 5 {$0 =  StartDate OFS EndDate OFS basename(FILENAME) OFS curr OFS $0; print}
' OFS='\t' path/to/sourcefiles/*.txt > path/to/outfile.txt

谢谢!

这些是每个文件中字段标题之前的行.内容从第4行开始:

These are the lines before the field headers in every file. Content begins on line 4:

Provider ,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,

我的"脚本

几乎可以使用.但是每个文件都包含1-3行: aw 函数basename(file){ sub(.*/",",文件) 返回文件 } 开始{FS = OFS =,"} 噪声比3 { 如果(NR == 2){ hdr ="Report_Period" OFS val = val $ 1 OFS } 下一个 } FNR> 3 { 打印文件名",hdr $ 0 下一个 } {print basename(FILENAME),val $ 0} 'OFS =,"/path/to/input/files>〜/path/to/output/file/file.csv

"MY" SCRIPT

It almost works. But it includes lines 1-3 for every file: gawk ' function basename(file) { sub(".*/", "", file) return file } BEGIN { FS=OFS="," } NR < 3 { if ( NR == 2 ) { hdr = "Report_Period" OFS val = val $1 OFS } next } FNR>3 { print "Filename", hdr $0 next } { print basename(FILENAME), val $0 } ' OFS="," /path/to/input/files > ~/path/to/output/file/file.csv

编辑结束

推荐答案

您的示例输入格式尚不清楚,但这可能是您要查找的内容,或者可能做得超出必要,或者完全是其他事情:

Your sample input format isn't clear but this might be what you're looking for or it might be doing more than necessary or something else entirely:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
/^Row Count/ { nextfile }
FNR==1 {
    fname = FILENAME
    sub(/.*[/]/,"",fname)
}
{
    gsub(/[\\]t/,FS)
    gsub(/[\\][/]/,"/")
    gsub(/[^\t]+/,"\"&\"")
}
FNR < 5 {
    if ( FNR != 3 ) {
        hdr = hdr $1 OFS
        val = val $2 OFS
    }
    next
}
FNR==5 {
    print "\"Filename\"", hdr $0
    next
}
{ print "\""fname"\"", val $0 }

$ awk -f tst.awk file
"Filename"      "Start Date"    "End Date"      "Currency"      "Country"       "Identifier"    "MType" "Quantity"   "Net"    "Net Net"       "Code"  "Title" "Contrib"       "I_Type"        "M_Type"        "Vendor Identifier"     "Offline Indicator"   "LSN"
"file"  "12/3/17"       "12/30/17"      "ZAR"   "ZA"    "44057330"      "FMP"   "1"     "0.050666"      "0.050666"   "USYYYYYYYYYY"   "ABC"   "Tom"   "1"     "1"     "USYYYYYYYYYY"  "0"     "SUT"
"file"  "12/3/17"       "12/30/17"      "ZAR"   "ZA"    "1267456726"    "SIMT"  "1"     "0.03"  "0.03"  "USXXXXXXXXXX""DEF"   "Frances"       "1"     "1"     "USXXXXXXXXXX"  "0"     "XYZ"

上面的代码将GNU awk用于您已经在使用的nextfile.

The above uses GNU awk for nextfile, which you were already using.

这篇关于AWK合并文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆