awk程序文件执行 [英] awk Program File execution

查看:104
本文介绍了awk程序文件执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为我的最后一个问题是越来越长,这里是一个浓缩版与目前的code级。

摘要:我需要在管道分隔的输入文件,检查并确保所有适用的记录类型是present,添加任何的丢失,并确认/纠正每一个记录类型中的子场数

输入记录:

<$p$p><$c$c>AA|1234|ABCD|EDGFT|TR56BE|~BB||E5TGE|~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943|~FF|12345|SKIP|~GG|||TYBGFR
AA | 2345 | CDEF | GFHIT | 48UJKK |〜CC || 3FKTI

记录类型和子计数验证文件 known_flds 条目:

  AA〜5〜REQ
BB〜2〜选择
CC〜3〜选择
DD〜6〜选择
EE〜4〜选择
FF〜2〜SKP
GG〜4〜选择

当前脚本,没有子修正:

 #!的/ usr /斌/的awk -fBEGIN {FS = OFS =〜}FNR == {NR
    dflts [$ 1] = create_empty_field($ 1,$ 2)
    如果($ 3〜/ REQ |选择/)fld_order [++ fld_cnt] = $ 1
    fld_rule [$ 1] = $ 3
    下一个
}{
    FLDS =
    J = 1
    对于(i = 1; I&LT; = fld_cnt;我++){
        J = skip_flds(J)        如果($ J 1〜(^fld_order [I]))= FLD dflts [fld_order [I]
        其他{FLD = $焦耳; J ++}
        FLDS = FLDS(FLDS ==?:OFS)FLD
    }
    打印FLDS
}功能create_empty_field(姓名,CNT,FLD,I){
    FLD =名称
    对于(i = 1; I&LT; = CNT;我++){FLD = FLD| }
    回报(FLD)
}功能skip_flds(FNUM,名){
    名字= $ FNUM
    子(/ \\ | * $ /,,名字)
    而(fld_rule [名] ==SKP){
        FNUM ++
        名字= $ FNUM
        子(/ \\ | * $ /,,名字)
    }
    回报(FNUM)
}

我在表演的子域的验证和修正最初的尝试:

 #!的/ usr /斌/的awk -fBEGIN {FS = OFS =〜}FNR == {NR
    dflts [$ 1] = create_empty_field($ 1,$ 2)
    如果($ 3〜/ REQ |选择/)fld_order [++ fld_cnt] = $ 1
    fld_rule [$ 1] = $ 3
    下一个
}{
    FLDS =
    J = 1
    对于(i = 1; I&LT; = fld_cnt;我++){
        J = skip_flds(J)
        如果($ J 1〜(^fld_order [I]))= FLD dflts [fld_order [I]
        其他{FLD = fix_sub(附加$ J,$ 2); J ++}
        FLDS = FLDS(FLDS ==?:OFS)FLD
    }
    打印FLDS
}功能create_empty_field(姓名,CNT,FLD,I){
    FLD =名称
    对于(i = 1; I&LT; = CNT;我++){FLD = FLD| }
    回报(FLD)
}功能skip_flds(FNUM,名){
    名字= $ FNUM
    子(/ \\ | * $ /,,名字)
    而(fld_rule [名] ==SKP){
        FNUM ++
        名字= $ FNUM
        子(/ \\ | * $ /,,名字)
    }
    回报(FNUM)
}功能fix_sub(REC,NUM,UPD,CNT){
    CNT =拆分(REC,一个|) - 1
    UPD =
    如果(CNT!= NUM​​)
      {为(i = 1; I&LT; = $ NUM;我++)
       UPD = UPD一个[I]| }
    其他{UPD = $录音}
    回报(UPD)
}

以上导致的错误,当它到达了第二个记录类型。所以,现在我知道我需要捕获从 known_flds 文件中的第2个数值,以便通过对传递到 fix_sub 功能。

我会加入:

  sub_fld [$ 1] = $ 16

FNR == NR 部分,但除此之外,我的脑子简直是油炸,我无法前进。

我知道作为一个独立的 fix_sub 领域的工作。现在我只需要得到从 known_flds 读出的值通过。

所需的输出是:

<$p$p><$c$c>AA|1234|ABCD|EDGFT|TR56BE|~BB||~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943||~GG|||TYBGFR
AA | 2345 | CDEF | GFHIT | 48UJKK |〜BB ||〜CC || 3FKTI |〜DD ||||||〜EE ||||〜GG |||

原题: UNIX Shell脚本解决方案用于格式化管道分隔,分段文件


解决方案

试试这个修改后的脚本:

 #!的/ usr /斌/的awk -fBEGIN {FS = OFS =〜}FNR == {NR
    dflts [$ 1] = create_empty_field($ 1,$ 2)
    如果($ 3〜/ REQ |选择/){
        fld_order [++ fld_cnt] = $ 1
        subfld_cnt [$ 1] = $ 16
    }
    fld_rule [$ 1] = $ 3
    下一个
}{
    FLDS =
    J = 1
    对于(i = 1; I&LT; = fld_cnt;我++){
        J = skip_flds(J)
        如果($ J 1〜(^fld_order [I]))= FLD dflts [fld_order [I]
        其他{FLD = fix_sub(J); J ++}
        FLDS = FLDS(FLDS ==?:OFS)FLD
    }
    打印FLDS
}功能get_field_name(FNUM,名){
    名字= $ FNUM
    子(/ \\ | * $ /,,名字)
    回报(名)
}功能create_empty_field(姓名,CNT,FLD,I){
    FLD =名称
    对于(i = 1; I&LT; = CNT;我++){FLD = FLD| }
    回报(FLD)
}功能skip_flds(FNUM,名){
    名称= get_field_name(FNUM)
    而(fld_rule [名] ==SKP){
        FNUM ++
        名字= $ FNUM
        子(/ \\ | * $ /,,名字)
    }
    回报(FNUM)
}功能fix_sub(FNUM,名称,CNT,一,体细胞核移植,我,UPD){
    名称= get_field_name(FNUM)
    CNT =拆分($ FNUM,一个|) - 1
    SCNT = subfld_cnt [名]
    如果(CNT!= SCNT){
        对于(i = 1; I&LT; = SCNT;我++)
            UPD = UPD一个[I]|
        回报(UPD)
    }
    回报($ FNUM)
}

关键的区别:


  • subfld_cnt [$ 1 = $ 2 已添加到 REQ |选择部分中的 FNR == NR 块(处理 known_flds 文件)

  • 新增 get_field_name()函数返回其 FNUM 参数指定的字段的第一子域。

  • 名为 get_field_name()从功能 skip_flds()

  • 修改 fix_sub()来只拿 FNUM (所有其他变量是本地的功能)和如有必要,固定子场管的数量。现在在调用,只需要一个Ĵ参数为 fix_sub(J)

fix_sub()变动明细:


  • NAME = get_field_name(FNUM)来获取查询的字段名称

  • 拆分 $ FNUM ,并获得分裂的计数(在-1调整离开)

  • SCNT = subfld_cnt [名] 得到的加入到了加工阵列所需的字段计数 known_flds 文件。这是你丢失的主件。

  • CNT!= SCNT 修复subflds。

  • UPD 设置code离开了,但是去掉了 UPD = - 这是已经完成局部变量。

  • 个人preference - 无论是与价值,而不是其他直接返回

我收到以下内容:

  AA | 1234 | ABCD | EDGFT | TR56BE |〜BB ||〜CC | 253641 | 84597 |〜DD | 78HND | ACBE ||| 43 |〜EE | HISBL | 78943
||〜GG ||| TYBGFR |
AA | 2345 | CDEF | GFHIT | 48UJKK |〜BB ||〜CC || 3FKTI |〜DD ||||||〜EE ||||〜GG ||||

这并不完全符合你的期望的输出。所不同的是在最后的 |在 GG 字段。我想你所需的输出丢失了。否则,最终字段的最后管仅仅需要所有其它处理之后被丢弃。

As my last question was getting to long, here is a condensed version with the current code level.

Summary: I need to take in a pipe-delimited input file, check to ensure all applicable record types are present, add any that are missing, and verify/correct the number of subfields within each record type.

Input records:

AA|1234|ABCD|EDGFT|TR56BE|~BB||E5TGE|~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943|~FF|12345|SKIP|~GG|||TYBGFR
AA|2345|CDEF|GFHIT|48UJKK|~CC||3FKTI

Record type and subfield count validation file known_flds entries:

AA~5~req
BB~2~opt
CC~3~opt
DD~6~opt
EE~4~opt
FF~2~skp
GG~4~opt

Current script, without the subfield correction:

#!/usr/bin/awk -f

BEGIN { FS=OFS="~" }

FNR==NR {
    dflts[$1] = create_empty_field($1,$2)
    if( $3 ~ /req|opt/ ) fld_order[++fld_cnt] = $1
    fld_rule[$1] = $3
    next
}

{
    flds = ""
    j = 1
    for(i=1; i<=fld_cnt; i++) {
        j = skip_flds( j )

        if($j !~ ("^" fld_order[i])) fld = dflts[fld_order[i]]
        else { fld = $j; j++ }
        flds = flds (flds=="" ? "" : OFS) fld
    }
    print flds
}

function create_empty_field(name, cnt,     fld, i) {
    fld = name
    for(i=1; i<=cnt; i++) { fld = fld "|" }
    return( fld )
}

function skip_flds(fnum,     name) {
    name = $fnum
    sub(/\|.*$/, "", name)
    while(fld_rule[name] == "skp") {
        fnum++
        name = $fnum
        sub(/\|.*$/, "", name)
    }
    return( fnum )
}

My initial attempt at performing the validation and correction of the subfields:

#!/usr/bin/awk -f

BEGIN { FS=OFS="~" }

FNR==NR {
    dflts[$1] = create_empty_field($1,$2)
    if( $3 ~ /req|opt/ ) fld_order[++fld_cnt] = $1
    fld_rule[$1] = $3
    next
}

{
    flds = ""
    j = 1
    for(i=1; i<=fld_cnt; i++) {
        j = skip_flds( j )
        if($j !~ ("^" fld_order[i])) fld = dflts[fld_order[i]]
        else { fld = fix_sub($j,$2); j++ }
        flds = flds (flds=="" ? "" : OFS) fld
    }
    print flds
}

function create_empty_field(name, cnt,     fld, i) {
    fld = name
    for(i=1; i<=cnt; i++) { fld = fld "|" }
    return( fld )
}

function skip_flds(fnum,     name) {
    name = $fnum
    sub(/\|.*$/, "", name)
    while(fld_rule[name] == "skp") {
        fnum++
        name = $fnum
        sub(/\|.*$/, "", name)
    }
    return( fnum )
}

function fix_sub(rec, num,  upd, cnt) {
    cnt=split(rec,a,"|")-1
    upd=""
    if(cnt != num) 
      {for(i=1;i<=$num;i++) 
       upd = upd a[i] "|" }
    else { upd=$rec }
    return(upd)
}

The above resulted in errors when it reached the second record type. So now I know that I need to capture the 2nd value from the known_flds file in order to pass that through to the fix_sub function.

I will be adding:

        sub_fld[$1] = $2

In the FNR==NRsection, but beyond that, my brain is simply fried and I cannot move forward.

I know as a standalone, the fix_sub area works. Now I just need to get the value read from known_flds to pass through.

The desired output is:

AA|1234|ABCD|EDGFT|TR56BE|~BB||~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943||~GG|||TYBGFR
AA|2345|CDEF|GFHIT|48UJKK|~BB||~CC||3FKTI|~DD||||||~EE||||~GG|||

Original question: UNIX Shell Script Solution for formatting a pipe-delimited, segmented file

解决方案

Try this modified script:

#!/usr/bin/awk -f

BEGIN { FS=OFS="~" }

FNR==NR {
    dflts[$1] = create_empty_field($1,$2)
    if( $3 ~ /req|opt/ ) {
        fld_order[++fld_cnt] = $1
        subfld_cnt[$1] = $2
    }
    fld_rule[$1] = $3
    next
}

{
    flds = ""
    j = 1
    for(i=1; i<=fld_cnt; i++) {
        j = skip_flds( j )
        if($j !~ ("^" fld_order[i])) fld = dflts[fld_order[i]]
        else { fld = fix_sub(j); j++ }
        flds = flds (flds=="" ? "" : OFS) fld
    }
    print flds
}

function get_field_name(fnum,      name) {
    name = $fnum
    sub(/\|.*$/, "", name)
    return( name )
}

function create_empty_field(name, cnt,     fld, i) {
    fld = name
    for(i=1; i<=cnt; i++) { fld = fld "|" }
    return( fld )
}

function skip_flds(fnum,     name) {
    name = get_field_name(fnum)
    while(fld_rule[name] == "skp") {
        fnum++
        name = $fnum
        sub(/\|.*$/, "", name)
    }
    return( fnum )
}

function fix_sub(fnum,       name, cnt, a, scnt, i, upd) {
    name = get_field_name(fnum)
    cnt = split($fnum, a, "|")-1
    scnt = subfld_cnt[ name ]
    if(cnt != scnt) {
        for(i=1;i<=scnt;i++)
            upd = upd a[i] "|"
        return( upd )
    }
    return( $fnum )
}

The key differences:

  • subfld_cnt[$1] = $2 has been added to the req|opt section in the FNR==NR block ( handling the known_flds file )
  • Added get_field_name() function which returns the first subfield of the field specified by its fnum argument.
  • Called get_field_name() from function skip_flds()
  • Modified fix_sub() to take only the fnum ( all the other variables are local to the function ) and fix the number of subfield pipes if necessary. Now the call to it only takes a j argument as in fix_sub(j).

Breakdown of fix_sub() changes:

  • name = get_field_name(fnum) to get the field name for lookup
  • split the $fnum, and get the count of split (leaving in your -1 adjustment)
  • scnt = subfld_cnt[ name ] to get the desired field count from the array that was added to the processing of the known_flds file. This is primary piece you were missing.
  • When cnt != scnt fix the subflds.
  • Left in your upd setting code, but removed the upd = "" - that's already done for local variables.
  • Personal preference - return directly with either value instead of the else.

I get the following:

AA|1234|ABCD|EDGFT|TR56BE|~BB||~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943
||~GG|||TYBGFR|
AA|2345|CDEF|GFHIT|48UJKK|~BB||~CC||3FKTI|~DD||||||~EE||||~GG||||

which doesn't exactly match your desired output. The difference is in the final | in the GG field. I think your desired output is missing it. Otherwise, the final pipe of the final field just needs to be dropped after all other processing.

这篇关于awk程序文件执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆