从awk脚本将文本块打印到文件中[banner like] [英] print block of text to file from awk script [banner like]

查看:85
本文介绍了从awk脚本将文本块打印到文件中[banner like]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有awk脚本进行一些处理并将其输出发送到文件中. 如何在awk程序的BEGIN块中写出类似横幅的消息 首先到该文件,例如bash heredoc.

我知道我可以使用多个print命令,但是有某种方式可以 一个print命令,但保留带换行符的多行文本.

所以输出应如下所示:

#########################################
#      generated by some author         #
#        ENVIRON["VAR"]
#########################################

好的格式的另一个问题是ENVIRON["VAR"]应该是 在字符串中间扩展到那里.

解决方案

简单的方法是使用Heredoc并将其保存在awk变量中:

VAR="whatever"
awk -v var="\
#########################################
#      generated by some author         #
#        $VAR
#########################################" '
BEGIN{ print var }
'
#########################################
#      generated by some author         #
#        whatever
#########################################

或者,这可能比您想要的要多,但是下面是我用来提供比awk中的此处文档更好的命令.将模板文本添加到多个文件时,我发现它绝对无价..

这是一个Shell脚本,它将带有稍微扩展的语法的awk脚本(以方便此处的文档)作为输入,调用gawk将扩展的语法转换为普通的awk打印语句,然后再次调用gawk以执行生成的脚本. /p>

我称其为扩展打印" awk的"epawk",其后是该工具以及一些使用方法的示例.当您调用它而不是直接调用awk时,可以编写包含预格式化文本块以供打印的脚本,就像您希望使用here-doc(每个#之前的空格是制表符)一样:

$ export VAR="whatever"
$ epawk 'BEGIN {
    print <<-!
        #########################################
        #      generated by some author         #
        #        "ENVIRON["VAR"]"
        #########################################
    !
}'
#########################################
#      generated by some author         #
#        whatever
#########################################

通过从awk脚本创建awk脚本然后执行它来工作.如果您只想查看正在生成的脚本,如果您将其赋予-X参数,例如,epawk将打印生成的脚本而不是执行该脚本,例如:

$ epawk -X 'BEGIN {
    print <<-!
        #########################################
        #      generated by some author         #
        #        "ENVIRON["VAR"]"
        #########################################
    !
}'
BEGIN {
print "#########################################"
print "#      generated by some author         #"
print "#        "ENVIRON["VAR"]""
print "#########################################"
}

脚本:

#!/bin/bash
# The above must be the first line of this script as bash or zsh is
# required for the shell array reference syntax used in this script.

##########################################################
# Extended Print AWK
#
# Allows printing of pre-formatted blocks of multi-line text in awk scripts.
#
# Before invoking the tool, do the following IN ORDER:
#
# 1) Start each block of pre-formatted text in your script with
#       print << TERMINATOR
#    on it's own line and end it with 
#   TERMINATOR
#    on it's own line. TERMINATOR can be any sequence of non-blank characters
#    you like. Spaces are allowed around the symbols but are not required.
#    If << is followed by -, e.g.:
#       print <<- TERMINATOR
#    then all leading tabs are removed from the block of pre-formatted
#    text (just like shell here documents), if it's followed by + instead, e.g.:
#       print <<+ TERMINATOR
#    then however many leading tabs are common across all non-blank lines
#    in the current pre-formatted block are removed.
#    If << is followed by =, e.g.
#       print <<= TERMINATOR
#    then whatever leading white space (tabs or blanks) occurs before the
#    "print" command will be removed from all non-blank lines in
#    the current pre-formatted block.
#    By default no leading spaces are removed. Anything you place after
#    the TERMINATOR will be reproduced as-is after every line in the
#    post-processed script, so this for example:
#   print << HERE |"cat>&2"
#       foo
#   HERE
#    would cause "foo" to be printed to stderr.
#
# 2) Within each block of pre-formatted text only:
#   a) Put a backslash character before every backslash (\ -> \\).
#   b) Put a backslash character before every double quote (" -> \").
#   c) Enclose awk variables in double quotes without leading
#      backslashes (awkVar -> "awkVar").
#   d) Enclose awk record and field references ($0, $1, $2, etc.)
#      in double quotes without leading backslashes ($1 -> "$1").
#
# 3) If the script is specified on the command line instead of via
#    "-f script" then replace all single quote characters (') in or out
#    of the pre-formatted blocks with their ANSI octal escape sequence (\047)
#    or the sequence '\'' (tick backslash tick tick). This is normal and is
#    required because command-line awk scripts cannot contain single quote
#    characters as those delimit the script. Do not use hex \x27, see
#    http://awk.freeshell.org/PrintASingleQuote.
#
# Then just use it like you would gawk with the small caveat that only
# "-W <option>", not "--<option>", is supported for long options so you
# can use "-W re-interval" but not "--re-interval" for example.
#
# To just see the post-processed script and not execute it, call this
# script with the "-X" option.
#
# See the bottom of this file for usage examples.
##########################################################

expand_prints() {

    gawk '

        !inBlock {
        if ( match($0,/^[[:blank:]]*print[[:blank:]]*<</) ) {

        # save any blanks before the print in case 
        # skipType "=" is used.
        leadBlanks = $0
        sub(/[^[:blank:]].*$/,"",leadBlanks)

        $0 = substr($0,RSTART+RLENGTH)

            if      ( sub(/^[-]/,"") )  { skipType = "-" }
            else if ( sub(/^[+]/,"") )  { skipType = "+" }
            else if ( sub(/^[=]/,"") )  { skipType = "=" }
            else                { skipType = ""  }

            gsub(/(^[[:blank:]]+|[[:blank:]]+$)/,"")

            if (/[[:blank:]]/) {
                terminator = $0
                    sub(/[[:blank:]].*/,"",terminator)

            postprint = $0
                sub(/[^[:blank:]]+[[:blank:]]+/,"",postprint)
            }
            else {
                terminator = $0
            postprint = ""
            }

            startBlock()

            next
        }
        }

        inBlock {

        stripped=$0
        gsub(/(^[[:blank:]]+|[[:blank:]]+$)/,"",stripped)

        if ( stripped"" == terminator"" ) {
            endBlock()
        }
        else {
            updBlock()
        }

        next
        }

        { print }

    function startBlock() { inBlock=1; numLines=0  }

    function updBlock()   { block[++numLines] = $0 }

    function endBlock(  i,numSkip,indent) {

        if (skipType == "") {
        # do not skip any leading tabs
        indent = ""
        }
        else if (skipType == "-") {
        # skip all leading tabs
        indent = "[\t]+"
        }
        else if (skipType == "+") {

        # skip however many leading tabs are common across
        # all non-blank lines in the current pre-formatted block

            for (i=1;i<=numLines;i++) {

            if (block[i] ~ /[^[:blank:]]/) {

                match(block[i],/^[\t]+/)

                if ( (numSkip == "") || (numSkip > RLENGTH) ) {
                numSkip = RLENGTH
                }
            }
            }

            for (i=1;i<=numSkip;i++) {
            indent = indent "\t"
            }
        }
        else if (skipType == "=") {
        # skip whatever pattern of blanks existed
        # before the "print" statement
        indent = leadBlanks
        }


        for (i=1;i<=numLines;i++) {
                sub(indent,"",block[i])
        print "print \"" block[i] "\"\t" postprint
        }

        inBlock=0
    }

    ' "$@"

}

unset awkArgs
unset scriptFiles
expandOnly=0
while getopts "v:F:W:f:X" arg
do
        case $arg in
    f ) scriptFiles+=( "$OPTARG" ) ;;
        [vFW] ) awkArgs+=( "-$arg" "$OPTARG" ) ;;
    X ) expandOnly=1 ;;
        * )     exit 1 ;;
        esac
done
shift $(( OPTIND - 1 ))

if [ -z "${scriptFiles[*]}" -a "$#" -gt "0" ]
then
    # The script cannot contain literal 's because in cases like this:
    #   'BEGIN{ ...abc'def... }'
    # the args parsed here (and later again by gawk) would be:
    #   $1 = BEGIN{ ...abc
    #   $2 = def... }
    # Replace 's with \047 or '\'' if you need them:
    #   'BEGIN{ ...abc\047def... }'
    #   'BEGIN{ ...abc'\''def... }'
    scriptText="$1"
    shift
fi

# Remaining symbols in "$@" must be data file names and/or variable
# assignments that do not use the "-v name=value" syntax.

if [ -n "${scriptFiles[*]}" ]
then
    if (( expandOnly == 1 ))
    then
    expand_prints "${scriptFiles[@]}"
    else
    gawk "${awkArgs[@]}" "$(expand_prints "${scriptFiles[@]}")" "$@"
    fi

elif [ -n "$scriptText" ]
then
    if (( expandOnly == 1 ))
    then
    printf '%s\n' "$scriptText" | expand_prints
    else
    gawk "${awkArgs[@]}" "$(printf '%s\n' "$scriptText" | expand_prints)" "$@"
    fi
else
    printf '%s: ERROR: no awk script specified.\n' "$toolName" >&2
    exit 1
fi

用法示例:

$ cat data.txt
abc def"ghi

.

#######
$ cat script.awk
{
    awkVar="bar" 

    print "----------------"

    print << HERE
    backslash: \\

        quoted text: \"text\"

    single quote as ANSI sequence: \047

    literal single quote (ONLY works when script is in a file): '

    awk variable: "awkVar"

    awk field: "$2"
    HERE

    print "----------------"

    print <<-!
        backslash: \\

            quoted text: \"text\"

        single quote as ANSI sequence: \047

        literal single quote (ONLY works when script is in a file): '

        awk variable: "awkVar"

        awk field: "$2"
    !

    print "----------------"

    print <<+           whatever
        backslash: \\

    quoted text: \"text\"

        single quote as ANSI sequence: \047

        literal single quote (ONLY works when script is in a file): '

        awk variable: "awkVar"

        awk field: "$2"
    whatever

    print "----------------"
}

.

$ epawk -f script.awk data.txt
----------------
    backslash: \

        quoted text: "text"

    single quote as ANSI sequence: '

    literal single quote (ONLY works when script is in a file): '

    awk variable: bar

    awk field: def"ghi
----------------
backslash: \

    quoted text: "text"

single quote as ANSI sequence: '

literal single quote (ONLY works when script is in a file): '

awk variable: bar

awk field: def"ghi
----------------
    backslash: \

quoted text: "text"

    single quote as ANSI sequence: '

    literal single quote (ONLY works when script is in a file): '

    awk variable: bar

    awk field: def"ghi
----------------

.

$ epawk -F\" '{
print <<!
    ANSI-tick-surrounded quote-separated field 2 (will work): \047"$2"\047
!
}' data.txt
    ANSI-tick-surrounded quote-separated field 2 (will work): 'ghi'

.

epawk -F\" '{
print <<!
    Shell-escaped-tick-surrounded quote-separated field 2 (will work): '\''"$2"'\''
    "
}' data.txt
    Shell-escaped-tick-surrounded quote-separated field 2 (will work): 'ghi'

.

$ epawk -F\" '{
print <<!
    Literal-tick-surrounded quote-separated field 2 (will not work): '"$2"'
!
}' data.txt
    Literal-tick-surrounded quote-separated field 2 (will not work): 

.

$ epawk -X 'BEGIN{
print <<!
    foo
    bar
!
}'
BEGIN{
print "    foo"
print "    bar"
}

.

$ cat file
a
b
c

.

$ epawk '{
    print <<+! |"cat>o2"
        numLines="NR"
                numFields="NF", $0="$0", $1="$1"
    !
}' file

.

$ cat o2
numLines=1
        numFields=1, $0=a, $1=a
numLines=2
        numFields=1, $0=b, $1=b
numLines=3
        numFields=1, $0=c, $1=c

.

$ epawk 'BEGIN{

    cmd = "sort"
    print <<+! |& cmd
        d
        b
        a
        c
    !
    close(cmd, "to")

    while ( (cmd |& getline line) > 0 ) {
        print "got:", line
    }
    close(cmd)

}' file
got: a
got: b
got: c
got: d

I have awk script doing some processing and sending it's output to a file. How would I writeout in BEGIN block of my awk program a banner-like message to that file first, something like bash heredoc.

I know I could use multiple print commands, but is there some way of having one print command but preserving multiline text with newlines etc.

So the output should look something like this:

#########################################
#      generated by some author         #
#        ENVIRON["VAR"]
#########################################

Additional problem of nice formatting is that ENVIRON["VAR"] should be expanded there in a middle of string.

解决方案

The simple way is to use a heredoc and save it in an awk variable:

VAR="whatever"
awk -v var="\
#########################################
#      generated by some author         #
#        $VAR
#########################################" '
BEGIN{ print var }
'
#########################################
#      generated by some author         #
#        whatever
#########################################

Alternatively, this may be more than you wanted, but below is the command I use to provide something a bit better than just here docs in awk. I find it absolutely invaluable when adding template text to multiple files..

It's a shell script which takes an awk script with slightly extended syntax (to facilitate here documents) as input, invokes gawk to transform that extended syntax to normal awk print statements, and then calls gawk again to execute the resulting script.

I call it "epawk" for "extended print" awk and what follows is the tool plus several examples of how to use it. When you invoke it instead of invoking awk directly you can write scripts that include blocks of pre-formatted text for printing like you'd want to with a here-doc (the space before each # is a tab character):

$ export VAR="whatever"
$ epawk 'BEGIN {
    print <<-!
        #########################################
        #      generated by some author         #
        #        "ENVIRON["VAR"]"
        #########################################
    !
}'
#########################################
#      generated by some author         #
#        whatever
#########################################

It works by creating an awk script from your awk script and then executing it. If you'd just like to see the script that is being generated, epawk will print the generated script instead of executing it if you give it the -X argument, e.g.:

$ epawk -X 'BEGIN {
    print <<-!
        #########################################
        #      generated by some author         #
        #        "ENVIRON["VAR"]"
        #########################################
    !
}'
BEGIN {
print "#########################################"
print "#      generated by some author         #"
print "#        "ENVIRON["VAR"]""
print "#########################################"
}

THE SCRIPT:

#!/bin/bash
# The above must be the first line of this script as bash or zsh is
# required for the shell array reference syntax used in this script.

##########################################################
# Extended Print AWK
#
# Allows printing of pre-formatted blocks of multi-line text in awk scripts.
#
# Before invoking the tool, do the following IN ORDER:
#
# 1) Start each block of pre-formatted text in your script with
#       print << TERMINATOR
#    on it's own line and end it with 
#   TERMINATOR
#    on it's own line. TERMINATOR can be any sequence of non-blank characters
#    you like. Spaces are allowed around the symbols but are not required.
#    If << is followed by -, e.g.:
#       print <<- TERMINATOR
#    then all leading tabs are removed from the block of pre-formatted
#    text (just like shell here documents), if it's followed by + instead, e.g.:
#       print <<+ TERMINATOR
#    then however many leading tabs are common across all non-blank lines
#    in the current pre-formatted block are removed.
#    If << is followed by =, e.g.
#       print <<= TERMINATOR
#    then whatever leading white space (tabs or blanks) occurs before the
#    "print" command will be removed from all non-blank lines in
#    the current pre-formatted block.
#    By default no leading spaces are removed. Anything you place after
#    the TERMINATOR will be reproduced as-is after every line in the
#    post-processed script, so this for example:
#   print << HERE |"cat>&2"
#       foo
#   HERE
#    would cause "foo" to be printed to stderr.
#
# 2) Within each block of pre-formatted text only:
#   a) Put a backslash character before every backslash (\ -> \\).
#   b) Put a backslash character before every double quote (" -> \").
#   c) Enclose awk variables in double quotes without leading
#      backslashes (awkVar -> "awkVar").
#   d) Enclose awk record and field references ($0, $1, $2, etc.)
#      in double quotes without leading backslashes ($1 -> "$1").
#
# 3) If the script is specified on the command line instead of via
#    "-f script" then replace all single quote characters (') in or out
#    of the pre-formatted blocks with their ANSI octal escape sequence (\047)
#    or the sequence '\'' (tick backslash tick tick). This is normal and is
#    required because command-line awk scripts cannot contain single quote
#    characters as those delimit the script. Do not use hex \x27, see
#    http://awk.freeshell.org/PrintASingleQuote.
#
# Then just use it like you would gawk with the small caveat that only
# "-W <option>", not "--<option>", is supported for long options so you
# can use "-W re-interval" but not "--re-interval" for example.
#
# To just see the post-processed script and not execute it, call this
# script with the "-X" option.
#
# See the bottom of this file for usage examples.
##########################################################

expand_prints() {

    gawk '

        !inBlock {
        if ( match($0,/^[[:blank:]]*print[[:blank:]]*<</) ) {

        # save any blanks before the print in case 
        # skipType "=" is used.
        leadBlanks = $0
        sub(/[^[:blank:]].*$/,"",leadBlanks)

        $0 = substr($0,RSTART+RLENGTH)

            if      ( sub(/^[-]/,"") )  { skipType = "-" }
            else if ( sub(/^[+]/,"") )  { skipType = "+" }
            else if ( sub(/^[=]/,"") )  { skipType = "=" }
            else                { skipType = ""  }

            gsub(/(^[[:blank:]]+|[[:blank:]]+$)/,"")

            if (/[[:blank:]]/) {
                terminator = $0
                    sub(/[[:blank:]].*/,"",terminator)

            postprint = $0
                sub(/[^[:blank:]]+[[:blank:]]+/,"",postprint)
            }
            else {
                terminator = $0
            postprint = ""
            }

            startBlock()

            next
        }
        }

        inBlock {

        stripped=$0
        gsub(/(^[[:blank:]]+|[[:blank:]]+$)/,"",stripped)

        if ( stripped"" == terminator"" ) {
            endBlock()
        }
        else {
            updBlock()
        }

        next
        }

        { print }

    function startBlock() { inBlock=1; numLines=0  }

    function updBlock()   { block[++numLines] = $0 }

    function endBlock(  i,numSkip,indent) {

        if (skipType == "") {
        # do not skip any leading tabs
        indent = ""
        }
        else if (skipType == "-") {
        # skip all leading tabs
        indent = "[\t]+"
        }
        else if (skipType == "+") {

        # skip however many leading tabs are common across
        # all non-blank lines in the current pre-formatted block

            for (i=1;i<=numLines;i++) {

            if (block[i] ~ /[^[:blank:]]/) {

                match(block[i],/^[\t]+/)

                if ( (numSkip == "") || (numSkip > RLENGTH) ) {
                numSkip = RLENGTH
                }
            }
            }

            for (i=1;i<=numSkip;i++) {
            indent = indent "\t"
            }
        }
        else if (skipType == "=") {
        # skip whatever pattern of blanks existed
        # before the "print" statement
        indent = leadBlanks
        }


        for (i=1;i<=numLines;i++) {
                sub(indent,"",block[i])
        print "print \"" block[i] "\"\t" postprint
        }

        inBlock=0
    }

    ' "$@"

}

unset awkArgs
unset scriptFiles
expandOnly=0
while getopts "v:F:W:f:X" arg
do
        case $arg in
    f ) scriptFiles+=( "$OPTARG" ) ;;
        [vFW] ) awkArgs+=( "-$arg" "$OPTARG" ) ;;
    X ) expandOnly=1 ;;
        * )     exit 1 ;;
        esac
done
shift $(( OPTIND - 1 ))

if [ -z "${scriptFiles[*]}" -a "$#" -gt "0" ]
then
    # The script cannot contain literal 's because in cases like this:
    #   'BEGIN{ ...abc'def... }'
    # the args parsed here (and later again by gawk) would be:
    #   $1 = BEGIN{ ...abc
    #   $2 = def... }
    # Replace 's with \047 or '\'' if you need them:
    #   'BEGIN{ ...abc\047def... }'
    #   'BEGIN{ ...abc'\''def... }'
    scriptText="$1"
    shift
fi

# Remaining symbols in "$@" must be data file names and/or variable
# assignments that do not use the "-v name=value" syntax.

if [ -n "${scriptFiles[*]}" ]
then
    if (( expandOnly == 1 ))
    then
    expand_prints "${scriptFiles[@]}"
    else
    gawk "${awkArgs[@]}" "$(expand_prints "${scriptFiles[@]}")" "$@"
    fi

elif [ -n "$scriptText" ]
then
    if (( expandOnly == 1 ))
    then
    printf '%s\n' "$scriptText" | expand_prints
    else
    gawk "${awkArgs[@]}" "$(printf '%s\n' "$scriptText" | expand_prints)" "$@"
    fi
else
    printf '%s: ERROR: no awk script specified.\n' "$toolName" >&2
    exit 1
fi

USAGE EXAMPLES:

$ cat data.txt
abc def"ghi

.

#######
$ cat script.awk
{
    awkVar="bar" 

    print "----------------"

    print << HERE
    backslash: \\

        quoted text: \"text\"

    single quote as ANSI sequence: \047

    literal single quote (ONLY works when script is in a file): '

    awk variable: "awkVar"

    awk field: "$2"
    HERE

    print "----------------"

    print <<-!
        backslash: \\

            quoted text: \"text\"

        single quote as ANSI sequence: \047

        literal single quote (ONLY works when script is in a file): '

        awk variable: "awkVar"

        awk field: "$2"
    !

    print "----------------"

    print <<+           whatever
        backslash: \\

    quoted text: \"text\"

        single quote as ANSI sequence: \047

        literal single quote (ONLY works when script is in a file): '

        awk variable: "awkVar"

        awk field: "$2"
    whatever

    print "----------------"
}

.

$ epawk -f script.awk data.txt
----------------
    backslash: \

        quoted text: "text"

    single quote as ANSI sequence: '

    literal single quote (ONLY works when script is in a file): '

    awk variable: bar

    awk field: def"ghi
----------------
backslash: \

    quoted text: "text"

single quote as ANSI sequence: '

literal single quote (ONLY works when script is in a file): '

awk variable: bar

awk field: def"ghi
----------------
    backslash: \

quoted text: "text"

    single quote as ANSI sequence: '

    literal single quote (ONLY works when script is in a file): '

    awk variable: bar

    awk field: def"ghi
----------------

.

$ epawk -F\" '{
print <<!
    ANSI-tick-surrounded quote-separated field 2 (will work): \047"$2"\047
!
}' data.txt
    ANSI-tick-surrounded quote-separated field 2 (will work): 'ghi'

.

epawk -F\" '{
print <<!
    Shell-escaped-tick-surrounded quote-separated field 2 (will work): '\''"$2"'\''
    "
}' data.txt
    Shell-escaped-tick-surrounded quote-separated field 2 (will work): 'ghi'

.

$ epawk -F\" '{
print <<!
    Literal-tick-surrounded quote-separated field 2 (will not work): '"$2"'
!
}' data.txt
    Literal-tick-surrounded quote-separated field 2 (will not work): 

.

$ epawk -X 'BEGIN{
print <<!
    foo
    bar
!
}'
BEGIN{
print "    foo"
print "    bar"
}

.

$ cat file
a
b
c

.

$ epawk '{
    print <<+! |"cat>o2"
        numLines="NR"
                numFields="NF", $0="$0", $1="$1"
    !
}' file

.

$ cat o2
numLines=1
        numFields=1, $0=a, $1=a
numLines=2
        numFields=1, $0=b, $1=b
numLines=3
        numFields=1, $0=c, $1=c

.

$ epawk 'BEGIN{

    cmd = "sort"
    print <<+! |& cmd
        d
        b
        a
        c
    !
    close(cmd, "to")

    while ( (cmd |& getline line) > 0 ) {
        print "got:", line
    }
    close(cmd)

}' file
got: a
got: b
got: c
got: d

这篇关于从awk脚本将文本块打印到文件中[banner like]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆