如何在POSIX正则表达式中将连字符作为字符范围转义 [英] How do you escape a hyphen as character range in a POSIX regex
问题描述
我有一个包含如下值的csv文件:
I have a csv file full of values such as this:
0.00145423,3.03795e-05
我想检查所有行是否一致,所以我尝试grep查找任何意外字符,例如...
I wanted to check that all the lines were consistent so I tried to grep for any unexpected characters like so...
grep '[^0-9,e\-\.]' myfile
在我看来,它是这样的:找到一行中任何字符[]
而不是^
数字0-9
,逗号,
,字母e e
,连字符\-
(试图通过\
或句点\.
逃脱.但是,连字符仍会继续匹配.
In my mind it goes like this: find a line with any character []
that is not ^
a number 0-9
, comma ,
, letter e e
, hyphen \-
(attempted to escape with \
), or a period \.
. However, hyphens still continue match.
[ EDIT ]仅在bash/grep中,这在python中不会发生:
[EDIT]This does not happen in python, only with bash/grep:
>>> re.search("[^0-9,e\-\.]", "0.00145423,3.03795e-05")
>>>
不满意的解决方案:
如果我将转义的连字符移到末尾,它将起作用:
unsatisfying solution:
If I move the escaped hyphen to the end it works:
grep '[^0-9,e\.\-]' myfile
将转义的连字符放在0-9
范围旁边会导致grep: Invalid range end
.
Putting the escaped hyphen next to the 0-9
range results in grep: Invalid range end
.
有人可以解释发生了什么吗?这是某些bash参数解析问题还是grep特有的东西?
Can someone explain what's going on? Is this some bash argument parsing issue or something specific to grep?
bash4.3.33
,grep2.21
推荐答案
在字符列表中包含文字-
的方法是将其放在方括号表达式的第一个或最后一个位置,如在以下位置回答:使用正则表达式获取最终的特殊字符.
The way to include a literal -
in a character list is to put it in the first or last position of the bracket expression, exactly as shown in the answer at: Get final special character with a regular expression.
从POSIX 9.3.5 RE括号表达式:The <hyphen> character shall be treated as itself if it occurs first (after an initial '^', if any) or last in the list, or as an ending range point in a range expression.
From POSIX 9.3.5 RE Bracket Expression: The <hyphen> character shall be treated as itself if it occurs first (after an initial '^', if any) or last in the list, or as an ending range point in a range expression.
某些工具可能还有其他方式来进行某种转义,但是始终可以放心地将其放在首位或最后.请注意,-
不是唯一具有不同行为的字符,具体取决于它在方括号表达式中的显示位置.考虑]
和^
.
Some tools might have additional ways of doing it with some kind of escaping but you're always safe to just put it first or last. Note that -
isn't the only character that has different behavior depending where it shows up in a bracket expression. Consider ]
, and ^
as well.
这篇关于如何在POSIX正则表达式中将连字符作为字符范围转义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!