如何在 POSIX 正则表达式中将连字符转义为字符范围 [英] How do you escape a hyphen as character range in a POSIX regex

查看:11
本文介绍了如何在 POSIX 正则表达式中将连字符转义为字符范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含如下值的 csv 文件:

0.00145423,3.03795e-05

我想检查所有行是否一致,所以我尝试 grep 查找任何像这样的意外字符...

grep '[^0-9,e-.]' 我的文件

在我看来,它是这样的:找到一行包含任何不是 ^ 的字符 [] 一个数字 0-9,逗号 、字母 e e、连字符 -(试图用 转义>) 或句点 ..但是,连字符仍然继续匹配.

[EDIT]这不会在 python 中发生,只有在 bash/grep 中才会发生:

<预><代码>>>>re.search("[^0-9,e-.]", "0.00145423,3.03795e-05")>>>

不满意的解决方案:
如果我将转义的连字符移到最后,它会起作用:

grep '[^0-9,e.-]' 我的文件

将转义的连字符放在 0-9 范围旁边会导致 grep: Invalid range end.

谁能解释一下这是怎么回事?这是一些 bash 参数解析问题还是特定于 grep 的问题?

bash4.3.33, grep2.21

解决方案

在字符列表中包含文字 - 的方法是将其放在括号表达式的第一个或最后一个位置,完全如答案所示:使用正则表达式获取最终特殊字符.

来自 POSIX 9.3.5 RE 括号表达式:<连字符>如果字符在列表中首先出现(在初始的 '^' 之后,如果有的话)或最后出现,或作为范围表达式中的结束范围点,则应将其视为自身.

有些工具可能有其他方法来进行某种转义,但将其放在第一位或最后一位总是安全的.请注意,- 不是唯一具有不同行为的字符,具体取决于它在括号表达式中的显示位置.考虑 ]^.

I have a csv file full of values such as this:

0.00145423,3.03795e-05

I wanted to check that all the lines were consistent so I tried to grep for any unexpected characters like so...

grep '[^0-9,e-.]' myfile

In my mind it goes like this: find a line with any character [] that is not ^ a number 0-9, comma ,, letter e e, hyphen - (attempted to escape with ), or a period .. However, hyphens still continue match.

[EDIT]This does not happen in python, only with bash/grep:

>>> re.search("[^0-9,e-.]", "0.00145423,3.03795e-05")
>>> 

unsatisfying solution:
If I move the escaped hyphen to the end it works:

grep '[^0-9,e.-]' myfile

Putting the escaped hyphen next to the 0-9 range results in grep: Invalid range end.

Can someone explain what's going on? Is this some bash argument parsing issue or something specific to grep?

bash4.3.33, grep2.21

解决方案

The way to include a literal - in a character list is to put it in the first or last position of the bracket expression, exactly as shown in the answer at: Get final special character with a regular expression.

From POSIX 9.3.5 RE Bracket Expression: The <hyphen> character shall be treated as itself if it occurs first (after an initial '^', if any) or last in the list, or as an ending range point in a range expression.

Some tools might have additional ways of doing it with some kind of escaping but you're always safe to just put it first or last. Note that - isn't the only character that has different behavior depending where it shows up in a bracket expression. Consider ], and ^ as well.

这篇关于如何在 POSIX 正则表达式中将连字符转义为字符范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆