是否存在存储正则表达式中所使用的确切字段分隔符FS(等同于RS的RT)的字段? [英] Is there a field that stores the exact field separator FS used when in a regular expression, equivalent to RT for RS?

查看:58
本文介绍了是否存在存储正则表达式中所使用的确切字段分隔符FS(等同于RS的RT)的字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

GNU Awk的4.1.2 记录中用 gawk 拆分,我们可以看到:

In GNU Awk's 4.1.2 Record Splitting with gawk we can read:

RS 是单个字符时, RT 包含相同的单个字符.但是,当 RS 是正则表达式时, RT 包含与正则表达式匹配的实际输入文本.

When RS is a single character, RT contains the same single character. However, when RS is a regular expression, RT contains the actual input text that matched the regular expression.

此变量 RT 某些情况下非常有用.

类似地,我们可以将正则表达式设置为字段分隔符.例如,在这里,我们允许它为;"或.".或"|":

Similarly, we can set a regular expression as the field separator. For example, in here we allow it to be either ";" or "|":

$ gawk -F';' '{print NF}' <<< "hello;how|are you"
2  # there are 2 fields, since ";" appears once
$ gawk -F'[;|]' '{print NF}' <<< "hello;how|are you"
3  # there are 3 fields, since ";" appears once and "|" also once

但是,如果我们想再次打包数据,则无法知道两个字段之间出现了哪个分隔符.因此,如果在上一个示例中,我想遍历这些字段并使用 FS 将它们再次打印在一起,则在每种情况下都将打印整个表达式:

However, if we want to pack the data again, we don't have a way to know which separator appeared between two fields. So if in the previous example I want to loop through the fields and print them together again by using FS, it prints the whole expression in every case:

$ gawk -F'[;|]' '{for (i=1;i<=NF;i++) printf ("%s%s", $i, FS)}' <<< "hello;how|are you"
hello[;|]how[;|]are you[;|]  # a literal "[;|]" shows in the place of FS

有没有一种方法可以重新包装"物品?这些字段使用特定的字段分隔符(用于分隔每个字段),类似于RT允许的操作?

Is there a way to "repack" the fields using the specific field separator used to split each one of them, similarly to what RT would allow to do?

(问题中给出的示例非常简单,但仅用于说明要点)

(the examples given in the question are rather simple, but just to show the point)

推荐答案

有没有一种方法可以重新包装"物品?使用用于拆分每个字段的特定字段分隔符对字段进行

Is there a way to "repack" the fields using the specific field separator used to split each one of them

使用 gnu-awk split() 使用提供的正则表达式为匹配的定界符有一个额外的第4个参数:

Using gnu-awk split() that has an extra 4th parameter for the matched delimiter using supplied regex:

s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {for (i=1; i in seps; i++) printf "%s%s", flds[i], seps[i]; print flds[i]}' <<< "$s"

hello;how|are you

更具可读性的版本:

s="hello;how|are you"
awk 'split($0, flds, /[;|]/, seps) {
   for (i=1; i in seps; i++)
      printf "%s%s", flds[i], seps[i]
   print flds[i]
}' <<< "$s"

记下 split 中的第四个 seps 参数,该参数通过第三个参数中使用的正则表达式存储匹配的文本数组,即/[; |]/.

Take note of 4th seps parameter in split that stores an array of matched text by regular expression used in 3rd parameter i.e. /[;|]/.

当然不是那么短&简单为 RS ORS RT ,可以写为:

Of course it is not as short & simple as RS, ORS and RT, which can be written as:

awk -v RS='[;|]' '{ORS = RT} 1' <<< "$s"

这篇关于是否存在存储正则表达式中所使用的确切字段分隔符FS(等同于RS的RT)的字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆