Bash if 语句中的正则表达式匹配 [英] Regex matching in a Bash if statement

查看:67
本文介绍了Bash if 语句中的正则表达式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里做错了什么?

尝试匹配任何包含空格、小写、大写或数字的字符串.特殊字符也不错,但我认为这需要转义某些字符.

Trying to match any string that contains spaces, lowercase, uppercase, or numbers. Special characters would be nice too, but I think that requires escaping certain characters.

TEST="THIS is a TEST title with some numbers 12345 and special char *&^%$#"

if [[ "$TEST" =~ [^a-zA-Z0-9 ] ]]; then BLAH; fi

这显然只测试上、下、数字和空格.虽然不起作用.

This obviously only tests for upper, lower, numbers, and spaces. Doesn't work though.

* 更新 *

我想我应该更具体.这是真正的代码行.

I guess I should have been more specific. Here is the actual real line of code.

if [[ "$TITLE" =~ [^a-zA-Z0-9 ] ]]; then RETURN="FAIL" && ERROR="ERROR: Title can only contain upper and lowercase letters, numbers, and spaces!"; fi

* 更新 *

./anm.sh: line 265: syntax error in conditional expression
./anm.sh: line 265: syntax error near `&*#]'
./anm.sh: line 265: `  if [[ ! "$TITLE" =~ [a-zA-Z0-9 $%^&*#] ]]; then RETURN="FAIL" && ERROR="ERROR: Title can only contain upper and lowercase letters, numbers, and spaces!"; return; fi'

推荐答案

关于 bash 的 [[ ]] 结构,有几个重要的事情需要了解.第一个:

There are a couple of important things to know about bash's [[ ]] construction. The first:

[[]]之间的词不进行分词和路径扩展;执行波浪号扩展、参数和变量扩展、算术扩展、命令替换、进程替换和引号删除.

Word splitting and pathname expansion are not performed on the words between the [[ and ]]; tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed.

第二件事:

可以使用额外的二元运算符=~",...运算符右侧的字符串被视为扩展正则表达式并进行相应匹配... 可以引用模式的任何部分强制将其作为字符串匹配.

An additional binary operator, ‘=~’, is available,... the string to the right of the operator is considered an extended regular expression and matched accordingly... Any part of the pattern may be quoted to force it to be matched as a string.

因此,=~ 两侧的 $v 将被扩展为该变量的值,但结果不会是分词或路径名-扩大了.换句话说,在左侧不加引号的变量扩展是完全安全的,但您需要知道变量扩展将发生在右侧.

Consequently, $v on either side of the =~ will be expanded to the value of that variable, but the result will not be word-split or pathname-expanded. In other words, it's perfectly safe to leave variable expansions unquoted on the left-hand side, but you need to know that variable expansions will happen on the right-hand side.

所以如果你写:[[ $x =~ [$0-9a-zA-Z] ]],右边正则表达式中的 $0 将是在解释正则表达式之前扩展,这可能会导致正则表达式无法编译(除非 $0 的扩展以数字或标点符号结尾,其 ascii 值小于一个数字).如果像这样引用右侧的[[ $x =~ "[$0-9a-zA-Z]" ]],那么右侧将被处理作为普通字符串,而不是正则表达式(并且 $0 仍将被扩展).在这种情况下你真正想要的是 [[ $x =~ [$0-9a-zA-Z] ]]

So if you write: [[ $x =~ [$0-9a-zA-Z] ]], the $0 inside the regex on the right will be expanded before the regex is interpreted, which will probably cause the regex to fail to compile (unless the expansion of $0 ends with a digit or punctuation symbol whose ascii value is less than a digit). If you quote the right-hand side like-so [[ $x =~ "[$0-9a-zA-Z]" ]], then the right-hand side will be treated as an ordinary string, not a regex (and $0 will still be expanded). What you really want in this case is [[ $x =~ [$0-9a-zA-Z] ]]

类似地,在解释正则表达式之前,[[]] 之间的表达式被拆分为单词.因此需要对正则表达式中的空格进行转义或引用.如果你想匹配字母、数字或空格,你可以使用:[[ $x =~ [0-9a-zA-Z ] ]].其他字符同样需要转义,例如 #,如果没有引用,它将开始注释.当然,你可以把模式放到一个变量中:

Similarly, the expression between the [[ and ]] is split into words before the regex is interpreted. So spaces in the regex need to be escaped or quoted. If you wanted to match letters, digits or spaces you could use: [[ $x =~ [0-9a-zA-Z ] ]]. Other characters similarly need to be escaped, like #, which would start a comment if not quoted. Of course, you can put the pattern into a variable:

pat="[0-9a-zA-Z ]"
if [[ $x =~ $pat ]]; then ...

对于包含大量需要转义或引用以通过 bash 词法分析器的字符的正则表达式,许多人更喜欢这种风格.但要注意:在这种情况下,您不能引用变量扩展:

For regexes which contain lots of characters which would need to be escaped or quoted to pass through bash's lexer, many people prefer this style. But beware: In this case, you cannot quote the variable expansion:

# This doesn't work:
if [[ $x =~ "$pat" ]]; then ...

最后,我认为您要做的是验证变量是否仅包含有效字符.执行此检查的最简单方法是确保它不包含无效字符.换句话说,像这样的表达式:

Finally, I think what you are trying to do is to verify that the variable only contains valid characters. The easiest way to do this check is to make sure that it does not contain an invalid character. In other words, an expression like this:

valid='0-9a-zA-Z $%&#' # add almost whatever else you want to allow to the list
if [[ ! $x =~ [^$valid] ]]; then ...

! 否定测试,将其变成不匹配"运算符,而 [^...] 正则表达式字符类表示除<代码>...".

! negates the test, turning it into a "does not match" operator, and a [^...] regex character class means "any character other than ...".

参数扩展和正则表达式运算符的组合可以使 bash 正则表达式语法几乎可读",但仍然存在一些问题.(不是一直都有吗?)一是你不能把 ] 放入 $valid,即使 $valid 被引用,除了在一开始.(这是 Posix 正则表达式规则:如果要在字符类中包含 ],它需要放在开头.- 可以放在开头或结尾,所以如果你同时需要]-,你需要以]开头并以-结尾,导致正则表达式我知道我在做什么"表情符号:[][-])

The combination of parameter expansion and regex operators can make bash regular expression syntax "almost readable", but there are still some gotchas. (Aren't there always?) One is that you could not put ] into $valid, even if $valid were quoted, except at the very beginning. (That's a Posix regex rule: if you want to include ] in a character class, it needs to go at the beginning. - can go at the beginning or the end, so if you need both ] and -, you need to start with ] and end with -, leading to the regex "I know what I'm doing" emoticon: [][-])

这篇关于Bash if 语句中的正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆