在awk中,为什么要“"和"\ n \ n"对待RS参数是否相同? [英] In awk, why are "" and "\n\n" treated the same for the RS parameter?

查看:31
本文介绍了在awk中,为什么要“"和"\ n \ n"对待RS参数是否相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是文件的内容:

Person Name
123 High Street
(222) 466-1234

Another person
487 High Street
(523) 643-8754

这两件事给出相同的结果:

And these two things give the same result:

$ awk 'BEGIN{FS="\n"; RS="\n\n"} {print $1, $3}' file_contents

$ awk 'BEGIN{FS="\n"; RS=""} {print $1, $3}' file_contents

两种情况下给出的结果是:

The result given in both cases is:

Person Name (222) 466-1234
Another person (523) 643-8754

RS ="\ n \ n" 实际上是有道理的,但是为什么 RS =" 也以同样的方式对待?

RS="\n\n" actually makes sense, but why is RS="" also treated the same way?

推荐答案

他们的待遇不一样.

  • RS =" 所有awk 中调用段落模式,因此输入被分成由连续的空行序列分隔的记录,并在FS中添加了换行符如果现有FS是单个字符(请注意:POSIX标准在此区域不正确,因为它暗示 \ n 将被添加到任何 FS 中,但事实并非如此,参见 https://lists.gnu.org/archive/html/bug-gawk/2019-04/msg00029.html ).
  • RS ="\ n \ n" GNU awk 中可以将记录分隔符设置为单个空白行,并且不影响FS.在所有其他操作中,第二个 \ n 将被忽略(每个POSIX中RS中超过1个字符是未定义的行为,因此它们可以执行任何操作,但这是迄今为止最常见的实现).
  • RS="" invokes paragraph mode in all awks and so the input is split into records separated by contiguous sequences of empty lines and a newline is added to the FS if the existing FS is a single character (note: the POSIX standard is incorrect in this area as it implies \n would get added to any FS but that's not the case, see https://lists.gnu.org/archive/html/bug-gawk/2019-04/msg00029.html).
  • RS="\n\n" works in GNU awk to set the record separator to a single blank line and does not affect FS. In all other awks the 2nd \n will be ignored (more than 1 char in a RS is undefined behavior per POSIX so they COULD do anything but that's by far the most common implementation).

看看在两个文本块之间有3个空白行并使用除 \ n (例如)以外的其他FS时会发生什么情况:

Look what happens when you have 3 blank lines between your 2 blocks of text and use a FS other than \n (e.g. ,):

$ cat file
Person Name
123 High Street
(222) 466-1234



Another person
487 High Street
(523) 643-8754

.

$ gawk 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>

2 3 <Another person
487 High Street
(523) 643-8754>

.

$ gawk --posix 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>

2 3 <Another person
487 High Street
(523) 643-8754>

.

$ gawk 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name
123 High Street
(222) 466-1234>

2 0 <>

3 1 <Another person
487 High Street
(523) 643-8754>

.

$ gawk --posix 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name>

2 1 <123 High Street>

3 1 <(222) 466-1234>

4 0 <>

5 0 <>

6 0 <>

7 1 <Another person>

8 1 <487 High Street>

9 1 <(523) 643-8754>

10 0 <>

请注意, NR NF 的值不同,并且正在打印的 $ 0 内容也不同.

Note the different values for NR and NF and different $0 contents being printed.

这篇关于在awk中,为什么要“"和"\ n \ n"对待RS参数是否相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆