如何在 POSIX 正则表达式中将连字符转义为字符范围 [英] How do you escape a hyphen as character range in a POSIX regex
问题描述
我有一个包含如下值的 csv 文件:
0.00145423,3.03795e-05
我想检查所有行是否一致,所以我尝试 grep 查找任何像这样的意外字符...
grep '[^0-9,e-.]' 我的文件
在我看来,它是这样的:找到一行包含任何不是 ^
的字符 []
一个数字 0-9
,逗号 、
、字母 e e
、连字符 -
(试图用 转义>) 或句点
.
.但是,连字符仍然继续匹配.
[EDIT]这不会在 python 中发生,只有在 bash/grep 中才会发生:
<预><代码>>>>re.search("[^0-9,e-.]", "0.00145423,3.03795e-05")>>>不满意的解决方案:
如果我将转义的连字符移到最后,它会起作用:
grep '[^0-9,e.-]' 我的文件
将转义的连字符放在 0-9
范围旁边会导致 grep: Invalid range end
.
谁能解释一下这是怎么回事?这是一些 bash 参数解析问题还是特定于 grep 的问题?
bash4.3.33
, grep2.21
在字符列表中包含文字 -
的方法是将其放在括号表达式的第一个或最后一个位置,完全如答案所示:使用正则表达式获取最终特殊字符.
来自 POSIX 9.3.5 RE 括号表达式:<连字符>如果字符在列表中首先出现(在初始的 '^' 之后,如果有的话)或最后出现,或作为范围表达式中的结束范围点,则应将其视为自身.
有些工具可能有其他方法来进行某种转义,但将其放在第一位或最后一位总是安全的.请注意,-
不是唯一具有不同行为的字符,具体取决于它在括号表达式中的显示位置.考虑 ]
和 ^
.
I have a csv file full of values such as this:
0.00145423,3.03795e-05
I wanted to check that all the lines were consistent so I tried to grep for any unexpected characters like so...
grep '[^0-9,e-.]' myfile
In my mind it goes like this: find a line with any character []
that is not ^
a number 0-9
, comma ,
, letter e e
, hyphen -
(attempted to escape with ), or a period
.
. However, hyphens still continue match.
[EDIT]This does not happen in python, only with bash/grep:
>>> re.search("[^0-9,e-.]", "0.00145423,3.03795e-05")
>>>
unsatisfying solution:
If I move the escaped hyphen to the end it works:
grep '[^0-9,e.-]' myfile
Putting the escaped hyphen next to the 0-9
range results in grep: Invalid range end
.
Can someone explain what's going on? Is this some bash argument parsing issue or something specific to grep?
bash4.3.33
, grep2.21
The way to include a literal -
in a character list is to put it in the first or last position of the bracket expression, exactly as shown in the answer at: Get final special character with a regular expression.
From POSIX 9.3.5 RE Bracket Expression: The <hyphen> character shall be treated as itself if it occurs first (after an initial '^', if any) or last in the list, or as an ending range point in a range expression.
Some tools might have additional ways of doing it with some kind of escaping but you're always safe to just put it first or last. Note that -
isn't the only character that has different behavior depending where it shows up in a bracket expression. Consider ]
, and ^
as well.
这篇关于如何在 POSIX 正则表达式中将连字符转义为字符范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!