在awk中,为什么要“"和"\ n \ n"对待RS参数是否相同? [英] In awk, why are "" and "\n\n" treated the same for the RS parameter?
本文介绍了在awk中,为什么要“"和"\ n \ n"对待RS参数是否相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
以下是文件的内容:
Person Name
123 High Street
(222) 466-1234
Another person
487 High Street
(523) 643-8754
这两件事给出相同的结果:
And these two things give the same result:
$ awk 'BEGIN{FS="\n"; RS="\n\n"} {print $1, $3}' file_contents
$ awk 'BEGIN{FS="\n"; RS=""} {print $1, $3}' file_contents
两种情况下给出的结果是:
The result given in both cases is:
Person Name (222) 466-1234
Another person (523) 643-8754
RS ="\ n \ n"
实际上是有道理的,但是为什么 RS ="
也以同样的方式对待?
RS="\n\n"
actually makes sense, but why is RS=""
also treated the same way?
推荐答案
他们的待遇不一样.
-
RS ="
在所有awk 中调用段落模式,因此输入被分成由连续的空行序列分隔的记录,并在FS中添加了换行符如果现有FS是单个字符(请注意:POSIX标准在此区域不正确,因为它暗示\ n
将被添加到任何FS
中,但事实并非如此,参见 https://lists.gnu.org/archive/html/bug-gawk/2019-04/msg00029.html ). -
RS ="\ n \ n"
在 GNU awk 中可以将记录分隔符设置为单个空白行,并且不影响FS.在所有其他操作中,第二个\ n
将被忽略(每个POSIX中RS中超过1个字符是未定义的行为,因此它们可以执行任何操作,但这是迄今为止最常见的实现).
RS=""
invokes paragraph mode in all awks and so the input is split into records separated by contiguous sequences of empty lines and a newline is added to the FS if the existing FS is a single character (note: the POSIX standard is incorrect in this area as it implies\n
would get added to anyFS
but that's not the case, see https://lists.gnu.org/archive/html/bug-gawk/2019-04/msg00029.html).RS="\n\n"
works in GNU awk to set the record separator to a single blank line and does not affect FS. In all other awks the 2nd\n
will be ignored (more than 1 char in a RS is undefined behavior per POSIX so they COULD do anything but that's by far the most common implementation).
看看在两个文本块之间有3个空白行并使用除 \ n
(例如,
)以外的其他FS时会发生什么情况:
Look what happens when you have 3 blank lines between your 2 blocks of text and use a FS other than \n
(e.g. ,
):
$ cat file
Person Name
123 High Street
(222) 466-1234
Another person
487 High Street
(523) 643-8754
.
$ gawk 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>
2 3 <Another person
487 High Street
(523) 643-8754>
.
$ gawk --posix 'BEGIN{FS=","; RS=""} {print NR, NF, "<" $0 ">\n"}' file
1 3 <Person Name
123 High Street
(222) 466-1234>
2 3 <Another person
487 High Street
(523) 643-8754>
.
$ gawk 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name
123 High Street
(222) 466-1234>
2 0 <>
3 1 <Another person
487 High Street
(523) 643-8754>
.
$ gawk --posix 'BEGIN{FS=","; RS="\n\n"} {print NR, NF, "<" $0 ">\n"}' file
1 1 <Person Name>
2 1 <123 High Street>
3 1 <(222) 466-1234>
4 0 <>
5 0 <>
6 0 <>
7 1 <Another person>
8 1 <487 High Street>
9 1 <(523) 643-8754>
10 0 <>
请注意, NR
和 NF
的值不同,并且正在打印的 $ 0
内容也不同.
Note the different values for NR
and NF
and different $0
contents being printed.
这篇关于在awk中,为什么要“"和"\ n \ n"对待RS参数是否相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文