awk程序文件执行 [英] awk Program File execution

查看：104 发布时间：2016/7/28 16:39:56 unix awk

本文介绍了awk程序文件执行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为我的最后一个问题是越来越长，这里是一个浓缩版与目前的code级。

摘要：我需要在管道分隔的输入文件，检查并确保所有适用的记录类型是present，添加任何的丢失，并确认/纠正每一个记录类型中的子场数

输入记录：

<$p$p><$c$c>AA|1234|ABCD|EDGFT|TR56BE|~BB||E5TGE|~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943|~FF|12345|SKIP|~GG|||TYBGFR
AA | 2345 | CDEF | GFHIT | 48UJKK |〜CC || 3FKTI

记录类型和子计数验证文件 known_flds 条目：

  AA〜5〜REQ
BB〜2〜选择
CC〜3〜选择
DD〜6〜选择
EE〜4〜选择
FF〜2〜SKP
GG〜4〜选择

当前脚本，没有子修正：

 ＃！的/ usr /斌/的awk -fBEGIN {FS = OFS =〜}FNR == {NR
    dflts [$ 1] = create_empty_field（$ 1，$ 2）
    如果（$ 3〜/ REQ |选择/）fld_order [++ fld_cnt] = $ 1
    fld_rule [$ 1] = $ 3
    下一个
}{
    FLDS =
    J = 1
    对于（i = 1; I＆LT; = fld_cnt;我++）{
        J = skip_flds（J）        如果（$ J 1〜（^fld_order [I]））= FLD dflts [fld_order [I]
        其他{FLD = $焦耳; J ++}
        FLDS = FLDS（FLDS ==？：OFS）FLD
    }
    打印FLDS
}功能create_empty_field（姓名，CNT，FLD，I）{
    FLD =名称
    对于（i = 1; I＆LT; = CNT;我++）{FLD = FLD| }
    回报（FLD）
}功能skip_flds（FNUM，名）{
    名字= $ FNUM
    子（/ \\ | * $ /，，名字）
    而（fld_rule [名] ==SKP）{
        FNUM ++
        名字= $ FNUM
        子（/ \\ | * $ /，，名字）
    }
    回报（FNUM）
}

我在表演的子域的验证和修正最初的尝试：

 ＃！的/ usr /斌/的awk -fBEGIN {FS = OFS =〜}FNR == {NR
    dflts [$ 1] = create_empty_field（$ 1，$ 2）
    如果（$ 3〜/ REQ |选择/）fld_order [++ fld_cnt] = $ 1
    fld_rule [$ 1] = $ 3
    下一个
}{
    FLDS =
    J = 1
    对于（i = 1; I＆LT; = fld_cnt;我++）{
        J = skip_flds（J）
        如果（$ J 1〜（^fld_order [I]））= FLD dflts [fld_order [I]
        其他{FLD = fix_sub（附加$ J，$ 2）; J ++}
        FLDS = FLDS（FLDS ==？：OFS）FLD
    }
    打印FLDS
}功能create_empty_field（姓名，CNT，FLD，I）{
    FLD =名称
    对于（i = 1; I＆LT; = CNT;我++）{FLD = FLD| }
    回报（FLD）
}功能skip_flds（FNUM，名）{
    名字= $ FNUM
    子（/ \\ | * $ /，，名字）
    而（fld_rule [名] ==SKP）{
        FNUM ++
        名字= $ FNUM
        子（/ \\ | * $ /，，名字）
    }
    回报（FNUM）
}功能fix_sub（REC，NUM，UPD，CNT）{
    CNT =拆分（REC，一个|） -  1
    UPD =
    如果（CNT！= NUM）
      {为（i = 1; I＆LT; = $ NUM;我++）
       UPD = UPD一个[I]| }
    其他{UPD = $录音}
    回报（UPD）
}

以上导致的错误，当它到达了第二个记录类型。所以，现在我知道我需要捕获从 known_flds 文件中的第2个数值，以便通过对传递到 fix_sub 功能。

我会加入：

  sub_fld [$ 1] = $ 16

在 FNR == NR 部分，但除此之外，我的脑子简直是油炸，我无法前进。

我知道作为一个独立的 fix_sub 领域的工作。现在我只需要得到从 known_flds 读出的值通过。

所需的输出是：

<$p$p><$c$c>AA|1234|ABCD|EDGFT|TR56BE|~BB||~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943||~GG|||TYBGFR
AA | 2345 | CDEF | GFHIT | 48UJKK |〜BB ||〜CC || 3FKTI |〜DD ||||||〜EE ||||〜GG |||

原题： UNIX Shell脚本解决方案用于格式化管道分隔，分段文件

解决方案

试试这个修改后的脚本：

 ＃！的/ usr /斌/的awk -fBEGIN {FS = OFS =〜}FNR == {NR
    dflts [$ 1] = create_empty_field（$ 1，$ 2）
    如果（$ 3〜/ REQ |选择/）{
        fld_order [++ fld_cnt] = $ 1
        subfld_cnt [$ 1] = $ 16
    }
    fld_rule [$ 1] = $ 3
    下一个
}{
    FLDS =
    J = 1
    对于（i = 1; I＆LT; = fld_cnt;我++）{
        J = skip_flds（J）
        如果（$ J 1〜（^fld_order [I]））= FLD dflts [fld_order [I]
        其他{FLD = fix_sub（J）; J ++}
        FLDS = FLDS（FLDS ==？：OFS）FLD
    }
    打印FLDS
}功能get_field_name（FNUM，名）{
    名字= $ FNUM
    子（/ \\ | * $ /，，名字）
    回报（名）
}功能create_empty_field（姓名，CNT，FLD，I）{
    FLD =名称
    对于（i = 1; I＆LT; = CNT;我++）{FLD = FLD| }
    回报（FLD）
}功能skip_flds（FNUM，名）{
    名称= get_field_name（FNUM）
    而（fld_rule [名] ==SKP）{
        FNUM ++
        名字= $ FNUM
        子（/ \\ | * $ /，，名字）
    }
    回报（FNUM）
}功能fix_sub（FNUM，名称，CNT，一，体细胞核移植，我，UPD）{
    名称= get_field_name（FNUM）
    CNT =拆分（$ FNUM，一个|） -  1
    SCNT = subfld_cnt [名]
    如果（CNT！= SCNT）{
        对于（i = 1; I＆LT; = SCNT;我++）
            UPD = UPD一个[I]|
        回报（UPD）
    }
    回报（$ FNUM）
}

关键的区别：

subfld_cnt [$ 1 = $ 2 已添加到 REQ |选择部分中的 FNR == NR 块（处理 known_flds 文件）

新增 get_field_name（）函数返回其 FNUM 参数指定的字段的第一子域。

名为 get_field_name（）从功能 skip_flds（）

修改 fix_sub（）来只拿 FNUM （所有其他变量是本地的功能）和如有必要，固定子场管的数量。现在在调用，只需要一个Ĵ参数为 fix_sub（J）。

的 fix_sub（）变动明细：

NAME = get_field_name（FNUM）来获取查询的字段名称

拆分的 $ FNUM ，并获得分裂的计数（在-1调整离开）

SCNT = subfld_cnt [名] 得到的加入到了加工阵列所需的字段计数 known_flds 文件。这是你丢失的主件。

当 CNT！= SCNT 修复subflds。

在 UPD 设置code离开了，但是去掉了 UPD = - 这是已经完成局部变量。

个人preference - 无论是与价值，而不是其他直接返回

我收到以下内容：

  AA | 1234 | ABCD | EDGFT | TR56BE |〜BB ||〜CC | 253641 | 84597 |〜DD | 78HND | ACBE ||| 43 |〜EE | HISBL | 78943
||〜GG ||| TYBGFR |
AA | 2345 | CDEF | GFHIT | 48UJKK |〜BB ||〜CC || 3FKTI |〜DD ||||||〜EE ||||〜GG ||||

这并不完全符合你的期望的输出。所不同的是在最后的 |在 GG 字段。我想你所需的输出丢失了。否则，最终字段的最后管仅仅需要所有其它处理之后被丢弃。

As my last question was getting to long, here is a condensed version with the current code level.

Summary: I need to take in a pipe-delimited input file, check to ensure all applicable record types are present, add any that are missing, and verify/correct the number of subfields within each record type.

Input records:

AA|1234|ABCD|EDGFT|TR56BE|~BB||E5TGE|~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943|~FF|12345|SKIP|~GG|||TYBGFR
AA|2345|CDEF|GFHIT|48UJKK|~CC||3FKTI

Record type and subfield count validation file known_flds entries:

AA~5~req
BB~2~opt
CC~3~opt
DD~6~opt
EE~4~opt
FF~2~skp
GG~4~opt

Current script, without the subfield correction:

#!/usr/bin/awk -f

BEGIN { FS=OFS="~" }

FNR==NR {
    dflts[$1] = create_empty_field($1,$2)
    if( $3 ~ /req|opt/ ) fld_order[++fld_cnt] = $1
    fld_rule[$1] = $3
    next
}

{
    flds = ""
    j = 1
    for(i=1; i<=fld_cnt; i++) {
        j = skip_flds( j )

        if($j !~ ("^" fld_order[i])) fld = dflts[fld_order[i]]
        else { fld = $j; j++ }
        flds = flds (flds=="" ? "" : OFS) fld
    }
    print flds
}

function create_empty_field(name, cnt,     fld, i) {
    fld = name
    for(i=1; i<=cnt; i++) { fld = fld "|" }
    return( fld )
}

function skip_flds(fnum,     name) {
    name = $fnum
    sub(/\|.*$/, "", name)
    while(fld_rule[name] == "skp") {
        fnum++
        name = $fnum
        sub(/\|.*$/, "", name)
    }
    return( fnum )
}

My initial attempt at performing the validation and correction of the subfields:

#!/usr/bin/awk -f

BEGIN { FS=OFS="~" }

FNR==NR {
    dflts[$1] = create_empty_field($1,$2)
    if( $3 ~ /req|opt/ ) fld_order[++fld_cnt] = $1
    fld_rule[$1] = $3
    next
}

{
    flds = ""
    j = 1
    for(i=1; i<=fld_cnt; i++) {
        j = skip_flds( j )
        if($j !~ ("^" fld_order[i])) fld = dflts[fld_order[i]]
        else { fld = fix_sub($j,$2); j++ }
        flds = flds (flds=="" ? "" : OFS) fld
    }
    print flds
}

function create_empty_field(name, cnt,     fld, i) {
    fld = name
    for(i=1; i<=cnt; i++) { fld = fld "|" }
    return( fld )
}

function skip_flds(fnum,     name) {
    name = $fnum
    sub(/\|.*$/, "", name)
    while(fld_rule[name] == "skp") {
        fnum++
        name = $fnum
        sub(/\|.*$/, "", name)
    }
    return( fnum )
}

function fix_sub(rec, num,  upd, cnt) {
    cnt=split(rec,a,"|")-1
    upd=""
    if(cnt != num) 
      {for(i=1;i<=$num;i++) 
       upd = upd a[i] "|" }
    else { upd=$rec }
    return(upd)
}

The above resulted in errors when it reached the second record type. So now I know that I need to capture the 2nd value from the known_flds file in order to pass that through to the fix_sub function.

I will be adding:

        sub_fld[$1] = $2

In the FNR==NRsection, but beyond that, my brain is simply fried and I cannot move forward.

I know as a standalone, the fix_sub area works. Now I just need to get the value read from known_flds to pass through.

The desired output is:

AA|1234|ABCD|EDGFT|TR56BE|~BB||~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943||~GG|||TYBGFR
AA|2345|CDEF|GFHIT|48UJKK|~BB||~CC||3FKTI|~DD||||||~EE||||~GG|||

Original question: UNIX Shell Script Solution for formatting a pipe-delimited, segmented file

解决方案

Try this modified script:

#!/usr/bin/awk -f

BEGIN { FS=OFS="~" }

FNR==NR {
    dflts[$1] = create_empty_field($1,$2)
    if( $3 ~ /req|opt/ ) {
        fld_order[++fld_cnt] = $1
        subfld_cnt[$1] = $2
    }
    fld_rule[$1] = $3
    next
}

{
    flds = ""
    j = 1
    for(i=1; i<=fld_cnt; i++) {
        j = skip_flds( j )
        if($j !~ ("^" fld_order[i])) fld = dflts[fld_order[i]]
        else { fld = fix_sub(j); j++ }
        flds = flds (flds=="" ? "" : OFS) fld
    }
    print flds
}

function get_field_name(fnum,      name) {
    name = $fnum
    sub(/\|.*$/, "", name)
    return( name )
}

function create_empty_field(name, cnt,     fld, i) {
    fld = name
    for(i=1; i<=cnt; i++) { fld = fld "|" }
    return( fld )
}

function skip_flds(fnum,     name) {
    name = get_field_name(fnum)
    while(fld_rule[name] == "skp") {
        fnum++
        name = $fnum
        sub(/\|.*$/, "", name)
    }
    return( fnum )
}

function fix_sub(fnum,       name, cnt, a, scnt, i, upd) {
    name = get_field_name(fnum)
    cnt = split($fnum, a, "|")-1
    scnt = subfld_cnt[ name ]
    if(cnt != scnt) {
        for(i=1;i<=scnt;i++)
            upd = upd a[i] "|"
        return( upd )
    }
    return( $fnum )
}

The key differences:

subfld_cnt[$1] = $2 has been added to the req|opt section in the FNR==NR block ( handling the known_flds file )
Added get_field_name() function which returns the first subfield of the field specified by its fnum argument.
Called get_field_name() from function skip_flds()
Modified fix_sub() to take only the fnum ( all the other variables are local to the function ) and fix the number of subfield pipes if necessary. Now the call to it only takes a j argument as in fix_sub(j).

Breakdown of fix_sub() changes:

name = get_field_name(fnum) to get the field name for lookup
split the $fnum, and get the count of split (leaving in your -1 adjustment)
scnt = subfld_cnt[ name ] to get the desired field count from the array that was added to the processing of the known_flds file. This is primary piece you were missing.
When cnt != scnt fix the subflds.
Left in your upd setting code, but removed the upd = "" - that's already done for local variables.
Personal preference - return directly with either value instead of the else.

I get the following:

AA|1234|ABCD|EDGFT|TR56BE|~BB||~CC|253641|84597|~DD|78HND|ACBE|||43|~EE|HISBL|78943
||~GG|||TYBGFR|
AA|2345|CDEF|GFHIT|48UJKK|~BB||~CC||3FKTI|~DD||||||~EE||||~GG||||

which doesn't exactly match your desired output. The difference is in the final | in the GG field. I think your desired output is missing it. Otherwise, the final pipe of the final field just needs to be dropped after all other processing.

这篇关于awk程序文件执行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

awk程序文件执行 [英] awk Program File execution

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

awk程序文件执行 [英] awk Program File execution

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭