如何在POSIX正则表达式中将连字符作为字符范围转义 [英] How do you escape a hyphen as character range in a POSIX regex

查看:85
本文介绍了如何在POSIX正则表达式中将连字符作为字符范围转义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含如下值的csv文件:

I have a csv file full of values such as this:

0.00145423,3.03795e-05

我想检查所有行是否一致,所以我尝试grep查找任何意外字符,例如...

I wanted to check that all the lines were consistent so I tried to grep for any unexpected characters like so...

grep '[^0-9,e\-\.]' myfile

在我看来,它是这样的:找到一行中任何字符[]而不是^数字0-9,逗号,,字母e e,连字符\-(试图通过\ 或句点\.逃脱.但是,连字符仍会继续匹配.

In my mind it goes like this: find a line with any character [] that is not ^ a number 0-9, comma ,, letter e e, hyphen \- (attempted to escape with \), or a period \.. However, hyphens still continue match.

[ EDIT ]仅在bash/grep中,这在python中不会发生:

[EDIT]This does not happen in python, only with bash/grep:

>>> re.search("[^0-9,e\-\.]", "0.00145423,3.03795e-05")
>>> 

不满意的解决方案:
如果我将转义的连字符移到末尾,它将起作用:

unsatisfying solution:
If I move the escaped hyphen to the end it works:

grep '[^0-9,e\.\-]' myfile

将转义的连字符放在0-9范围旁边会导致grep: Invalid range end.

Putting the escaped hyphen next to the 0-9 range results in grep: Invalid range end.

有人可以解释发生了什么吗?这是某些bash参数解析问题还是grep特有的东西?

Can someone explain what's going on? Is this some bash argument parsing issue or something specific to grep?

bash4.3.33grep2.21

推荐答案

在字符列表中包含文字-的方法是将其放在方括号表达式的第一个或最后一个位置,如在以下位置回答:使用正则表达式获取最终的特殊字符.

The way to include a literal - in a character list is to put it in the first or last position of the bracket expression, exactly as shown in the answer at: Get final special character with a regular expression.

从POSIX 9.3.5 RE括号表达式:The <hyphen> character shall be treated as itself if it occurs first (after an initial '^', if any) or last in the list, or as an ending range point in a range expression.

From POSIX 9.3.5 RE Bracket Expression: The <hyphen> character shall be treated as itself if it occurs first (after an initial '^', if any) or last in the list, or as an ending range point in a range expression.

某些工具可能还有其他方式来进行某种转义,但是始终可以放心地将其放在首位或最后.请注意,-不是唯一具有不同行为的字符,具体取决于它在方括号表达式中的显示位置.考虑]^.

Some tools might have additional ways of doing it with some kind of escaping but you're always safe to just put it first or last. Note that - isn't the only character that has different behavior depending where it shows up in a bracket expression. Consider ], and ^ as well.

这篇关于如何在POSIX正则表达式中将连字符作为字符范围转义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆