在Bash if语句中匹配正则表达式 [英] Regex matching in a Bash if statement

查看:94
本文介绍了在Bash if语句中匹配正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里做错了什么?

尝试匹配包含空格,小写,大写或数字的任何字符串。特殊字符也会很好,但我认为这需要转义某些字符。

Trying to match any string that contains spaces, lowercase, uppercase, or numbers. Special characters would be nice too, but I think that requires escaping certain characters.

TEST="THIS is a TEST title with some numbers 12345 and special char *&^%$#"

if [[ "$TEST" =~ [^a-zA-Z0-9\ ] ]]; then BLAH; fi

这显然只测试上限,下限,数字和空格。虽然不起作用。

This obviously only tests for upper, lower, numbers, and spaces. Doesn't work though.

*更新*

我想我应该更具体。这是实际的实际代码行。

I guess I should have been more specific. Here is the actual real line of code.

if [[ "$TITLE" =~ [^a-zA-Z0-9\ ] ]]; then RETURN="FAIL" && ERROR="ERROR: Title can only contain upper and lowercase letters, numbers, and spaces!"; fi

*更新*

./anm.sh: line 265: syntax error in conditional expression
./anm.sh: line 265: syntax error near `&*#]'
./anm.sh: line 265: `  if [[ ! "$TITLE" =~ [a-zA-Z0-9 $%^\&*#] ]]; then RETURN="FAIL" && ERROR="ERROR: Title can only contain upper and lowercase letters, numbers, and spaces!"; return; fi'


推荐答案

要了解一些重要事项关于bash的 [[]] 构造。第一个:

There are a couple of important things to know about bash's [[ ]] construction. The first:


不对 [[]] ;执行代码扩展,参数和变量扩展,算术扩展,命令替换,进程替换和引用删除。

Word splitting and pathname expansion are not performed on the words between the [[ and ]]; tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed.

第二件事:


另一个二元运算符'=〜'可用,...运算符右侧的字符串被视为扩展正则表达式并相应地匹配... 可以引用模式的任何部分以强制它匹配为字符串

因此, =〜两侧的 $ v 将扩展为值该变量,但结果不会是分词或路径名扩展。换句话说,在左侧保留变量扩展是非常安全的,但你需要知道变量扩展将在右侧发生。

Consequently, $v on either side of the =~ will be expanded to the value of that variable, but the result will not be word-split or pathname-expanded. In other words, it's perfectly safe to leave variable expansions unquoted on the left-hand side, but you need to know that variable expansions will happen on the right-hand side.

所以如果你写: [[$ x =〜[$ 0-9a-zA-Z]]] $ 0 右边的正则表达式内部将在解释正则表达式之前展开,这可能会导致正则表达式无法编译(除非 $ 0 的扩展以数字结尾或ascii值小于数字的标点符号)。 如果你引用右边的那样 [[$ x =〜[$ 0-9a-zA-Z]]] ,那么右边-hand side将被视为普通字符串,而不是正则表达式(并且 $ 0 仍将被展开)。在这种情况下你真正想要的是 [[$ x =〜[\ $ 0-9a-zA-Z]]]

So if you write: [[ $x =~ [$0-9a-zA-Z] ]], the $0 inside the regex on the right will be expanded before the regex is interpreted, which will probably cause the regex to fail to compile (unless the expansion of $0 ends with a digit or punctuation symbol whose ascii value is less than a digit). If you quote the right-hand side like-so [[ $x =~ "[$0-9a-zA-Z]" ]], then the right-hand side will be treated as an ordinary string, not a regex (and $0 will still be expanded). What you really want in this case is [[ $x =~ [\$0-9a-zA-Z] ]]

同样, [[]] 之间的表达式在正则表达式之前被拆分为单词解释。因此正则表达式中的空格需要被转义或引用。如果你想匹配字母,数字或空格,你可以使用: [[$ x =〜[0-9a-zA-Z \]]] 。其他字符同样需要进行转义,例如,如果没有引用则会启动评论。当然,您可以将模式放入变量中:

Similarly, the expression between the [[ and ]] is split into words before the regex is interpreted. So spaces in the regex need to be escaped or quoted. If you wanted to match letters, digits or spaces you could use: [[ $x =~ [0-9a-zA-Z\ ] ]]. Other characters similarly need to be escaped, like #, which would start a comment if not quoted. Of course, you can put the pattern into a variable:

pat="[0-9a-zA-Z ]"
if [[ $x =~ $pat ]]; then ...

对于包含大量字符的正则表达式,需要转义或引用才能通过通过bash的词法分析器,很多人都喜欢这种风格。但要注意:在这种情况下,你不能引用变量扩展:

For regexes which contain lots of characters which would need to be escaped or quoted to pass through bash's lexer, many people prefer this style. But beware: In this case, you cannot quote the variable expansion:

# This doesn't work:
if [[ $x =~ "$pat" ]]; then ...

最后,我认为你要做的是验证只有变量包含有效字符。执行此检查的最简单方法是确保它不包含无效字符。换句话说,这样的表达式:

Finally, I think what you are trying to do is to verify that the variable only contains valid characters. The easiest way to do this check is to make sure that it does not contain an invalid character. In other words, an expression like this:

valid='0-9a-zA-Z $%&#' # add almost whatever else you want to allow to the list
if [[ ! $x =~ [^$valid] ]]; then ...

否定测试,将其变为不匹配运算符, [^ ...] 正则表达式字符类表示以外的任何字符。

! negates the test, turning it into a "does not match" operator, and a [^...] regex character class means "any character other than ...".

参数扩展和正则表达式运算符的组合可以使bash正则表达式语法几乎可读,但仍然存在一些问题。 (总是不存在吗?)一个是你不能把] 加入 $ valid ,即使 $ valid 被引用,除了一开始。 (这是一个Posix正则表达式规则:如果你想在一个字符类中包含] ,它需要在开头。 - 可以在开头或结尾,所以如果你需要] - ,你需要以] 开头,以 - 结尾,导致正则表达式我知道我在做什么表情符号: [] [ - ]

The combination of parameter expansion and regex operators can make bash regular expression syntax "almost readable", but there are still some gotchas. (Aren't there always?) One is that you could not put ] into $valid, even if $valid were quoted, except at the very beginning. (That's a Posix regex rule: if you want to include ] in a character class, it needs to go at the beginning. - can go at the beginning or the end, so if you need both ] and -, you need to start with ] and end with -, leading to the regex "I know what I'm doing" emoticon: [][-])

这篇关于在Bash if语句中匹配正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆