awk的字段分隔符更改后重新评估记录中的字段 [英] Re-evaluating fields in a record after awk field separator change

查看:194
本文介绍了awk的字段分隔符更改后重新评估记录中的字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(这是我的第一篇文章在这里,所以请原谅我,如果我问这个问题的错误的方式。)

(This is my first post here, so please forgive me if I am asking the question the wrong way.)

我学习 AWK 我的OSX特立独行。我会通过 awk的本教程

I am learning awk on my OSX Maverick. I am going through this tutorial on awk.

我试图重现相似,在教程中的awk_example4a.awk东西。

I am trying to reproduce something similar to the awk_example4a.awk in that tutorial.

于是我想出了这个awk程序/脚本/参数(不知道你叫什么呢??)

So I came up with this awk program/script/arguments (not sure what you call it??):

BEGIN { i=1 }
{
    print "Line " i;
    print "$1 is " $1,"\n$2 is " $2, "\n$3 is " $3;
    FS=":";
    $0=$0;
    print "With the new FS - line " i;
    print "$1 is " $1,"\n$2 is " $2, "\n$3 is " $3;
    FS=" ";
    i++;
}

和输入文件看起来是这样的:

And the input file looks like this:

A1 B1:B2 C2
A1:A2 B2:B3 C3

我所试图做的是用默认的第一个处理每一行/记录 FS (空格),然后重新处理同一个新的 FS (:),然后还原默认的 FS 才去到下一个记录

What I am trying to do is to process each line/record first with the default FS (whitespace), and then re-process the same with a new FS (":"), then restore the default FS before going to the next record.

据本教程中, $ 0 = $ 1,0 应该接受 AWK 来重新评估使用的字段新的字段分隔,因而理应给我一个输出看起来像这样的:

According to the tutorial, $0=$0 is supposed to get awk to re-evaluate the fields using the new field separator, and thus supposedly giving me an output that looks like this:

Line 1
$1 is A1 
$2 is B1:B2 
$3 is C2
With the new FS - line 1
$1 is A1 B1
$2 is B2 C2
$3 is
Line 2
$1 is A1:A2 
$2 is B2:B3 
$3 is C3
With the new FS - line 2
$1 is A1
$2 is A2 B2
$3 is B3 C3

但是,相反,我得到:

But instead, I get:

Line 1
$1 is A1 
$2 is B1:B2 
$3 is C2
With the new FS - the line 1
$1 is A1 
$2 is B1:B2 
$3 is C2
Line 2
$1 is A1:A2 
$2 is B2:B3 
$3 is C3
With the new FS - the line 2
$1 is A1:A2 
$2 is B2:B3 
$3 is C3

即。领域尚未经过重新评估 FS 改变。

所以,如果 $ 0 = $ 1,0 不起作用(和也不做这样的事情 $ 1 = $ 1; $ 2 = $ 2 ),我怎么用awk的重新评估同线不同的 FS

So if $0=$0 doesn't work (and nor do things like $1=$1; $2=$2), how do I get awk to re-evaluate the same line using a different FS?

感谢您。

推荐答案

的FreeBSD / OS X AWK 不适变为 FS (字段分隔符),直到的之后的当前的记录已完成处理 - 这种行为实际上是 POSIX授权的(见下文)

tl; dr:

FreeBSD/OS X awk doesn't apply changes to FS (the field separator) until after the current record has finished processing - this behavior is actually POSIX-mandated (see below).

解决方法 不要更改 FS 和使用功能拆分()而不是:

Workaround: Do not change FS and use function split() instead:

{
    print "Line " ++i
    print "$1 is " $1 "\n$2 is " $2 "\n$3 is " $3
    split($0, flds, ":")   # split current line by ':' into array `flds`
    print "With the new FS - line " i
    print "field1 is " flds[1] "\nfield2 is " flds[2] "\nfield3 is " flds[3]
}


  • 请注意 BEGIN 块是依靠未初始化的变量在数字环境中默认为 0 消除。

  • 实例是从打印删除语句,因为每个将插入空格(默认值在输出字段分隔符, OFS ),这是不是在这种情况下,必要的。

  • 鉴于陈述换行分隔,; 时并不需要它们终止

    • Note how the BEGIN block was eliminated by relying on uninitialized variables defaulting to 0 in numeric contexts.
    • The , instances were removed from the print statements, because each would insert a space (the default value of the output-field separator, OFS), which is not needed in this case.
    • Given that the statements are newline-separated, ; is not needed to terminate them.
    • 请继续阅读乐趣多平台的兼容性问题。

      Read on for the fun multi-platform compatibility details.

      借助 POSIX规范。为 AWK 国家(重点煤矿):

      The POSIX spec. for awk states (emphasis mine):

      
      Before the first reference to a field in the record is evaluated, the record shall be 
      split into fields, according to the rules in Regular Expressions, 
      **using the value of FS that was current at the time the record was read**.
      

      对于以 $ 1,0 分配一个新的值或一个特定的领域,相同的源状态:

      With respect to assigning a new value to $0 or a specific field, the same source states:

      
      The symbol $0 shall refer to the entire record; setting any other field causes 
      the re-evaluation of $0. Assigning to $0 shall reset the values of all other
      fields and the NF built-in variable.
      

      在换句话说:鉴于重新分配的情况下没有其它状态,在POSIX规范给定的 FS 值的范围仅供参考。任务,这是的到给定的输入记录的。
      肯定是有歧义,而且如果将规范一定的帮助。解决了 - 这么说,保守的,因此更安全的跨pretation是假设的恒定,同时处理-A-给记录 FS

      In other words: Given that the re-assignment case doesn't state otherwise, the only reference to the scope of a given FS value in the POSIX spec. mandates that it be constant for a given input record. There is definitely ambiguity, and it would certainly help if the spec. resolved that - that said, the conservative and thus safer interpretation is to assume a constant-while-processing-a-given-record FS.

      因此​​,它是FreeBSD的/ OS X AWK 是这样的模范公民,而 GNU AWK mawk 由规则不打和应用 FS 改变甚至到了的当前提供更多的灵活性的有关重新分配给 $ 1,0 或记录任何特定领域

      As such, it is FreeBSD/OS X awk that is the model citizen, whereas GNU awk and also mawk offer more flexibility by NOT playing by the rules and applying FS changes even to the current record on re-assigning to $0 or any specific field.

      请注意,但是,GNU AWK (如V4.1.1的)甚至不更改与这种行为 - POSIX 选项,其前preSS意图是导致POSIX兼容行为。
      如果我读POSIX规范。正确(不要告诉我,我是否),这应该算是一个的漏洞

      Note, however, that GNU awk (as of v4.1.1) doesn't even change that behavior with the --posix option, whose express intent is to result in POSIX-compliant behavior. If I'm reading the POSIX spec. correctly (do tell me whether I am), this should be considered a bug.

      这篇关于awk的字段分隔符更改后重新评估记录中的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆