如果在表达式之前或之后设置字段分隔符,为什么会有不同的考虑? [英] Why is field separator taken into account differently if set before or after the expression?

查看:70
本文介绍了如果在表达式之前或之后设置字段分隔符,为什么会有不同的考虑?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

代码print split("foo:bar", a)返回尝试基于字段分隔符进行剪切时执行split()的切片数量.由于默认的字段分隔符是空格,并且在"foo:bar"中没有空格,因此结果为1:

$ awk 'BEGIN{print split("foo:bar",a)}'
1

但是,如果字段分隔符为:",则结果显然为2("foo"和"bar"):

$ awk 'BEGIN{FS=":"; print split("foo:bar", a)}'
2
$ awk -F: 'BEGIN{print split("foo:bar", a)}'
2

但是,如果在Awk表达式的之后中定义了FS,则不会这样:

$ awk 'BEGIN{print split("foo:bar", a)}' FS=":"
1

如果不是在BEGIN块中打印而是在处理文件时打印,则已经考虑了FS:

$ echo "bla" > file
$ awk '{print split("foo:bar",a)}' FS=":" file
2

因此,看起来FS之前设置了 BEGIN块中已经考虑了该表达式,而在之后中未定义该表达式.

为什么会这样?我在 GNU Awk中找不到详细信息用户指南→4.5.4从命令行设置FS .我正在研究GNU Awk 5.

解决方案

此功能不是GNU awk固有的,而是POSIX.

呼叫约定:

awk调用约定如下:

awk [-F sepstring] [-v assignment]... program [argument...]
awk [-F sepstring] -f progfile [-f progfile]... [-v assignment]...
       [argument...]

这表明传递给awk的任何选项(标志-F,-v,-f)应在程序定义和可能的参数之前发生.这表明:

# this works
$ awk -F: '1' /dev/null
# this fails
$ awk '1' -F: /dev/null
awk: fatal: cannot open file `-F:' for reading (No such file or directory)

字段分隔符和分配作为选项:

标准规定:

-F sepstring::定义输入字段分隔符.此选项应等效于:-v FS=sepstring

-v assignment: 应用程序应确保赋值参数的形式与赋值操作数相同. 指定的变量分配应在执行awk程序之前发生,包括与BEGIN模式相关的动作(如果有).可以指定多次出现此选项.

源: POSIX awk标准

因此,如果您使用选项定义变量分配或声明字段分隔符,则BEGIN将了解它们:

$ awk -F: -v a=1 'BEGIN{print FS,a}'
: 1

什么是参数?:

标准规定:

argument:可以将以下两种类型的参数之一混合使用: 文件

  • 包含要读取的输入的文件的路径名,该路径名与程序中的模式集匹配.如果未指定文件操作数,或者文件操作数为-",则应使用标准输入. 作业
  • < snip:陈述varname=varvalue> 的句子很长,应指定变量分配而不是路径名. < snip:有关varname=varvalue> 含义的一些扩展细节,每个这样的变量分配都应在处理以下文件(如果有的话)之前进行. ,第一个文件参数之前的赋值应在BEGIN 动作(如果有)之后执行,而最后一个文件参数之后的赋值应在END动作之前强>(如果有).如果没有文件参数,则应在处理标准输入之前执行分配.

源: POSIX awk标准

这意味着如果您这样做:

$ awk program FS=val file

BEGIN将不知道FS的新定义,但是程序的任何其他部分都将知道.

示例:

$ awk -v OFS="|" 'BEGIN{print "BEGIN",FS,a,""}END{print "END",a,""}' FS=: a=1 /dev/null
BEGIN| ||
END|:|1|
$ awk -v OFS="|" 'BEGIN{print "BEGIN",FS,a,""}
                  {print "ACTION",FS,a,""}
                  END{print "END",a,""}' FS=: a=1 <(echo 1) a=2
BEGIN| ||
ACTION|:|1|
END|:|2|

另请参阅:

The code print split("foo:bar", a) returns how many slices did split() when trying to cut based on the field separator. Since the default field separator is the space and there is none in "foo:bar", the result is 1:

$ awk 'BEGIN{print split("foo:bar",a)}'
1

However, if the field separator is ":" then the result is obviously 2 ("foo" and "bar"):

$ awk 'BEGIN{FS=":"; print split("foo:bar", a)}'
2
$ awk -F: 'BEGIN{print split("foo:bar", a)}'
2

However, it does not if FS is defined after the Awk expression:

$ awk 'BEGIN{print split("foo:bar", a)}' FS=":"
1

If I print it not in the BEGIN block but when processing a file, the FS is already taken into account:

$ echo "bla" > file
$ awk '{print split("foo:bar",a)}' FS=":" file
2

So it looks like FS set before the expression is already taken into account in the BEGIN block, while it is not if defined after.

Why is this happening? I could not find details on this in GNU Awk User's Guide → 4.5.4 Setting FS from the Command Line. I am working on GNU Awk 5.

解决方案

This feature is not inherent to GNU awk but is POSIX.

Calling convention:

The awk calling convention is the following:

awk [-F sepstring] [-v assignment]... program [argument...]
awk [-F sepstring] -f progfile [-f progfile]... [-v assignment]...
       [argument...]

This shows that any option (flags -F,-v,-f) passed to awk should occur before the program definition and possible arguments. This shows that:

# this works
$ awk -F: '1' /dev/null
# this fails
$ awk '1' -F: /dev/null
awk: fatal: cannot open file `-F:' for reading (No such file or directory)

Fieldseparators and assignments as options:

The Standard states:

-F sepstring: Define the input field separator. This option shall be equivalent to: -v FS=sepstring

-v assignment: The application shall ensure that the assignment argument is in the same form as an assignment operand. The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option can be specified.

source: POSIX awk standard

So, if you define a variable assignment or declare a field separator using the options, BEGIN will know them:

$ awk -F: -v a=1 'BEGIN{print FS,a}'
: 1

What are arguments?:

The Standard states:

argument: Either of the following two types of argument can be intermixed: file

  • A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no file operands are specified, or if a file operand is '-', the standard input shall be used. assignment
  • An <snip: extremely long sentence to state varname=varvalue>, shall specify a variable assignment rather than a pathname. <snip: some extended details on the meaning of varname=varvalue> Each such variable assignment shall occur just prior to the processing of the following file, if any. Thus, an assignment before the first file argument shall be executed after the BEGIN actions (if any), while an assignment after the last file argument shall occur before the END actions (if any). If there are no file arguments, assignments shall be executed before processing the standard input.

source: POSIX awk standard

Which means that if you do:

$ awk program FS=val file

BEGIN will not know about the new definition of FS but any other part of the program will.

Example:

$ awk -v OFS="|" 'BEGIN{print "BEGIN",FS,a,""}END{print "END",a,""}' FS=: a=1 /dev/null
BEGIN| ||
END|:|1|
$ awk -v OFS="|" 'BEGIN{print "BEGIN",FS,a,""}
                  {print "ACTION",FS,a,""}
                  END{print "END",a,""}' FS=: a=1 <(echo 1) a=2
BEGIN| ||
ACTION|:|1|
END|:|2|

See also:

这篇关于如果在表达式之前或之后设置字段分隔符,为什么会有不同的考虑?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆