是否存在存储正则表达式中所使用的确切字段分隔符FS(等同于RS的RT)的字段? [英] Is there a field that stores the exact field separator FS used when in a regular expression, equivalent to RT for RS?
问题描述
在 GNU Awk的4.1.2 记录中用 gawk
拆分,我们可以看到:
In GNU Awk's 4.1.2 Record Splitting with gawk
we can read:
当
RS
是单个字符时,RT
包含相同的单个字符.但是,当RS
是正则表达式时,RT
包含与正则表达式匹配的实际输入文本.
When
RS
is a single character,RT
contains the same single character. However, whenRS
is a regular expression,RT
contains the actual input text that matched the regular expression.
此变量 RT
在某些情况下非常有用.
类似地,我们可以将正则表达式设置为字段分隔符.例如,在这里,我们允许它为;"或.".或"|":
Similarly, we can set a regular expression as the field separator. For example, in here we allow it to be either ";" or "|":
$ gawk -F';' '{print NF}' <<< "hello;how|are you"
2 # there are 2 fields, since ";" appears once
$ gawk -F'[;|]' '{print NF}' <<< "hello;how|are you"
3 # there are 3 fields, since ";" appears once and "|" also once
但是,如果我们想再次打包数据,则无法知道两个字段之间出现了哪个分隔符.因此,如果在上一个示例中,我想遍历这些字段并使用 FS
将它们再次打印在一起,则在每种情况下都将打印整个表达式:
However, if we want to pack the data again, we don't have a way to know which separator appeared between two fields. So if in the previous example I want to loop through the fields and print them together again by using FS
, it prints the whole expression in every case:
$ gawk -F'[;|]' '{for (i=1;i<=NF;i++) printf ("%s%s", $i, FS)}' <<< "hello;how|are you"
hello[;|]how[;|]are you[;|] # a literal "[;|]" shows in the place of FS
有没有一种方法可以重新包装"物品?这些字段使用特定的字段分隔符(用于分隔每个字段),类似于RT允许的操作?
Is there a way to "repack" the fields using the specific field separator used to split each one of them, similarly to what RT would allow to do?
(问题中给出的示例非常简单,但仅用于说明要点)
(the examples given in the question are rather simple, but just to show the point)
推荐答案
有没有一种方法可以重新包装"物品?使用用于拆分每个字段的特定字段分隔符对字段进行
Is there a way to "repack" the fields using the specific field separator used to split each one of them
使用 gnu-awk
split()
使用提供的正则表达式为匹配的定界符有一个额外的第4个参数:
Using gnu-awk
split()
that has an extra 4th parameter for the matched delimiter using supplied regex:
s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {for (i=1; i in seps; i++) printf "%s%s", flds[i], seps[i]; print flds[i]}' <<< "$s"
hello;how|are you
更具可读性的版本:
s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {
for (i=1; i in seps; i++)
printf "%s%s", flds[i], seps[i]
print flds[i]
}' <<< "$s"
记下 split
中的第四个 seps
参数,该参数通过第三个参数中使用的正则表达式存储匹配的文本数组,即/[; |]/
.
Take note of 4th seps
parameter in split
that stores an array of matched text by regular expression used in 3rd parameter i.e. /[;|]/
.
当然不是那么短&简单为 RS
, ORS
和 RT
,可以写为:
Of course it is not as short & simple as RS
, ORS
and RT
, which can be written as:
awk -v RS='[;|]' '{ORS = RT} 1' <<< "$s"
这篇关于是否存在存储正则表达式中所使用的确切字段分隔符FS(等同于RS的RT)的字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!