AWK:设置多个字段分隔符时经常无效前pression:致命 [英] awk: fatal: Invalid regular expression when setting multiple field separators

查看:1803
本文介绍了AWK:设置多个字段分隔符时经常无效前pression:致命的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解决 grep的正则表达式来只选择10个字符的使用 AWK 。现在的问题在于一个字符串 XXXXXX [YYYYY - ZZZZZ 和OP要打印文本的独特 [和 - 文本中字符串

I was trying to solve Grep regex to select only 10 character using awk. The question consists in a string XXXXXX[YYYYY--ZZZZZ and the OP wants to print the text in between the unique [ and -- strings within the text.

如果它只是一个 - 我想说的使用 [ - [] 字段分隔(FS)。这是设置FS是不是 - [

If it was just one - I would say use [-[] as field separator (FS). This is setting the FS to be either - or [:

$ echo "XXXXXXX[YYYYY-ZZZZ" | awk -F[-[] '{print $2}'
YYYYY

棘手的一点是, [也有特殊含义的字符类,这样就使它成为PTED为可能的FS的一个正确间$ P $它不能被写入在第一位置。那么,这是说做 [ - [] 。因此,我们完成要匹配 - [

The tricky point is that [ has also a special meaning as a character class, so that to make it be correctly interpreted as one of the possible FS it cannot be written in the first position. Well, this is done by saying [-[]. So we are done to match either - or [.

然而,在这种情况下,它是不是一个而是两个连字符:我想说,无论是 - [。我不能说 [ - [] ,因为连字符也有一个意思来定义范围

However, in this case it is not one but two hyphens: I want to say either -- or [. I cannot say [--[] because the hyphen also has a meaning to define a range.

我能做的就是用 -F一个模式|另一种模式这样的:

What I can do is to use -F"one pattern|another pattern" like:

$ echo "XXXXXXXaaYYYYYbbZZZZ" | awk -F"aa|bb" '{print $2}'
YYYYY

所以,如果我尝试使用使用 - [,我不能得到正确的结果:

So if I try to use this with -- and [, I cannot get a proper result:

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk -F"--|[" '{print $2}'
awk: fatal: Invalid regular expression: /--|[/

而在事实上,即使没有 [作为条件之一:

$ echo "XXXXXXX[YYYYYbbZZZZ" | awk -F"bb|[" '{print $2}'
awk: fatal: Invalid regular expression: /bb|[/

$ echo "XXXXXXX[YYYYYbbZZZZ" | awk -F"bb|\[" '{print $2}'
awk: warning: escape sequence `\[' treated as plain `['
awk: fatal: Invalid regular expression: /bb|[/

$ echo "XXXXXXX[YYYYYbbZZZZ" | awk -F"(bb|\[)" '{print $2}'
awk: warning: escape sequence `\[' treated as plain `['
awk: fatal: Unmatched [ or [^: /(bb|[)/

您看,我想无论是逃避 [,在括号中并没有什么封闭工作。

You see I tried to either escaping [, enclosing in parentheses and nothing worked.

所以:我能做些什么,以字段分隔符设置为 - [?是否有可能呢?

So: what can I do to set the field separator to either -- or [? Is it possible at all?

推荐答案

恕我直言,这是最好的解释,如果我们通过在正则表达式看着正在使用的分裂()命令,因为这明确地显示了一个字符串分割所发生的事情开始到使用文字VS动态正则表达式,然后我们领域可以涉及,为字段分隔符。

IMHO this is best explained if we start by looking at a regexp being used by the split() command since that explicitly shows what is happening when a string is split into fields using a literal vs dynamic regexp and then we can relate that to Field Separators.

这使用文字正则表达式(由 / 分隔多个):

This uses a literal regexp (delimited by /s):

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk '{split($0,f,/\[|--/); print f[2]}'
YYYYY

等要求 [来进行转义所以它是因为字面上[是一个正则表达式元字符。

and so requires the [ to be escaped so it is taken literally since [ is a regexp metacharacter.

这些使用动态的正则表达式(一个存储为一个字符串):

These use a dynamic regexp (one stored as a string):

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk '{split($0,f,"\\[|--"); print f[2]}'
YYYYY

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk 'BEGIN{re="\\[|--"} {split($0,f,re); print f[2]}'
YYYYY

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk -v re='\\[|--' '{split($0,f,re); print f[2]}'
YYYYY

,因此需要在 [来转义2倍,因为awk可以转换的字符串拿着正则表达式(命名变量重新在过去的2个例子)到正则表达式(它用作拆分()调用的分隔符(它使用了第二个反斜线之前,它使用了加一个反斜杠))。

and so require the [ to be escaped 2 times since awk has to convert the string holding the regexp (a variable named re in the last 2 examples) to a regexp (which uses up one backslash) before it's used as the separator in the split() call (which uses up the second backslash).

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk -v re="\\\[|--" '{split($0,f,re); print f[2]}'
YYYYY

公开变量内容的外壳为它的评价等需要 [来进行转义3倍,因为shell首先解析字符串,试图扩大shell变量等(它使用移一个反斜杠),然后AWK一直到字符串转换拿着正则表达式一个正则表达式(它使用了第二个反斜杠)它用作拆分()调用的分隔符(它使用了第三反斜线)前

exposes the variable contents to the shell for it's evaluation and so requires the [ to be escaped 3 times since the shell parses the string first to try to expand shell variables etc. (which uses up one backslash) and then awk has to convert the string holding the regexp to a regexp (which uses up a second backslash) before it's used as the separator in the split() call (which uses up the third backslash).

一个分隔符是一些额外的语义存储为变量名为FS只是一个正则表达式(如重新以上),因此上述所有适用于它,因此:

A Field Separator is just a regexp stored as variable named FS (like re above) with some extra semantics so all of the above applies to it to, hence:

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk -F '\\[|--' '{print $2}'
YYYYY

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk -F "\\\[|--" '{print $2}'
YYYYY

请注意,我们也可以使用,而不是逃避它有一个支架前pression的 [字面上处理:

Note that we could have used a bracket expression instead of escaping it to have the [ treated literally:

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk '{split($0,f,/[[]|--/); print f[2]}'
YYYYY

,然后我们就不必担心逃避逃逸我们添加解析层:

and then we don't have to worry about escaping the escapes as we add layers of parsing:

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk -F "[[]|--" '{print $2}'
YYYYY

$ echo "XXXXXXX[YYYYY--ZZZZ" | awk -F '[[]|--' '{print $2}'
YYYYY

这篇关于AWK:设置多个字段分隔符时经常无效前pression:致命的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆