为什么 [01-12] 范围没有按预期工作? [英] Why doesn't [01-12] range work as expected?
问题描述
我正在尝试在正则表达式中使用范围模式 [01-12]
来匹配两位数的 mm,但这并没有按预期工作.
I'm trying to use the range pattern [01-12]
in regex to match two digit mm, but this doesn't work as expected.
推荐答案
您似乎误解了正则表达式中字符类定义的工作原理.
You seem to have misunderstood how character classes definition works in regex.
匹配任意字符串01
、02
、03
、04
、05
、06
、07
、08
、09
、10
、11
或 12
,类似这样的工作:
To match any of the strings 01
, 02
, 03
, 04
, 05
, 06
, 07
, 08
, 09
, 10
, 11
, or 12
, something like this works:
0[1-9]|1[0-2]
参考资料
- regular-expressions.info/字符类
- 数字范围(有很多关于匹配字符串解释为数字范围的例子)立>
References
- regular-expressions.info/Character Classes
- Numeric Ranges (have many examples on matching strings interpreted as numeric ranges)
A
= 65,Z
= 90a
= 97,z
= 122A
= 65,Z
= 90a
= 97,z
= 122[a-zA-Z]
和[A-Za-z]
是等价的- 在大多数风格中,
[a-Z]
可能是非法字符范围- 因为
a
(97) 比Z
(90)大于"
[a-zA-Z]
and[A-Za-z]
are equivalent- In most flavors,
[a-Z]
is likely to be an illegal character range- because
a
(97) is "greater than" thanZ
(90)
[
(91),\
(92),]
(93),^
(94),_
(95),`
(96)
[
(91),\
(92),]
(93),^
(94),_
(95),`
(96)
这篇关于为什么 [01-12] 范围没有按预期工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- because
- 因为
字符类本身尝试从输入字符串中匹配一个且恰好一个字符.
[01-12]
实际上定义了[012]
,一个字符类,将输入中的一个字符与 3 个字符0
中的任何一个进行匹配,1
或2
.A character class, by itself, attempts to match one and exactly one character from the input string.
[01-12]
actually defines[012]
, a character class that matches one character from the input against any of the 3 characters0
,1
, or2
.-
范围定义从1
到1
,其中仅包含1
.另一方面,像[1-9]
包括1
、2
、3
、4
、5
、6
、7
、8
、9
.The
-
range definition goes from1
to1
, which includes just1
. On the other hand, something like[1-9]
includes1
,2
,3
,4
,5
,6
,7
,8
,9
.初学者经常在定义诸如
[this|that]
之类的东西时犯错误.这不起作用.这个字符定义定义了[this|a]
,即它把输入中的一个字符与t
,h
,t
,h
,i
、s
、|
或a
.很可能(this|that)
就是我们想要的.Beginners often make the mistakes of defining things like
[this|that]
. This doesn't "work". This character definition defines[this|a]
, i.e. it matches one character from the input against any of 6 characters int
,h
,i
,s
,|
ora
. More than likely(this|that)
is what is intended.所以现在很明显像
between [24-48] hours
这样的模式不起作用.本例中的字符类等价于[248]
.So it's obvious now that a pattern like
between [24-48] hours
doesn't "work". The character class in this case is equivalent to[248]
.也就是说,字符类定义中的
-
不定义模式中的数字范围.正则表达式引擎并不真正理解"模式中的数字,除了有限重复语法(例如a{3,5}
匹配 3 和 5a
).That is,
-
in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g.a{3,5}
matches between 3 and 5a
).范围定义改为使用字符的 ASCII/Unicode 编码来定义范围.字符
0
在 ASCII 中编码为十进制 48;9
是 57.因此,字符定义[0-9]
包括编码中值在十进制 48 和 57 之间的所有字符.更明智的是,按照设计,这些字符是0
、1
、...、9
.Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character
0
is encoded in ASCII as decimal 48;9
is 57. Thus, the character definition[0-9]
includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters0
,1
, ...,9
.再来看看另一个常见的字符类定义
[a-zA-Z]
Let's take a look at another common character class definition
[a-zA-Z]
在 ASCII 中:
这意味着:
- regular-expressions.info/Character Classes
- 数字范围(有很多关于匹配字符串解释为数字范围的例子)立>