为什么 [01-12] 范围没有按预期工作? [英] Why doesn't [01-12] range work as expected?

查看:42
本文介绍了为什么 [01-12] 范围没有按预期工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在正则表达式中使用范围模式 [01-12] 来匹配两位数的 mm,但这并没有按预期工作.

I'm trying to use the range pattern [01-12] in regex to match two digit mm, but this doesn't work as expected.

推荐答案

您似乎误解了正则表达式中字符类定义的工作原理.

You seem to have misunderstood how character classes definition works in regex.

匹配任意字符串010203040506070809101112,类似这样的工作:

To match any of the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, or 12, something like this works:

0[1-9]|1[0-2]

参考资料

  • regular-expressions.info/字符类
    • 数字范围(有很多关于匹配字符串解释为数字范围的例子)

      References

      • regular-expressions.info/Character Classes
        • Numeric Ranges (have many examples on matching strings interpreted as numeric ranges)
        • 字符类本身尝试从输入字符串中匹配一个且恰好一个字符.[01-12] 实际上定义了 [012],一个字符类,将输入中的一个字符与 3 个字符 0 中的任何一个进行匹配,12.

          A character class, by itself, attempts to match one and exactly one character from the input string. [01-12] actually defines [012], a character class that matches one character from the input against any of the 3 characters 0, 1, or 2.

          - 范围定义从 11,其中仅包含 1.另一方面,像 [1-9] 包括 123456789.

          The - range definition goes from 1 to 1, which includes just 1. On the other hand, something like [1-9] includes 1, 2, 3, 4, 5, 6, 7, 8, 9.

          初学者经常在定义诸如[this|that]之类的东西时犯错误.这不起作用.这个字符定义定义了[this|a],即它把输入中的一个字符与t, h, t, h, is|a.很可能 (this|that) 就是我们想要的.

          Beginners often make the mistakes of defining things like [this|that]. This doesn't "work". This character definition defines [this|a], i.e. it matches one character from the input against any of 6 characters in t, h, i, s, | or a. More than likely (this|that) is what is intended.

          所以现在很明显像 between [24-48] hours 这样的模式不起作用.本例中的字符类等价于[248].

          So it's obvious now that a pattern like between [24-48] hours doesn't "work". The character class in this case is equivalent to [248].

          也就是说,字符类定义中的 - 不定义模式中的数字范围.正则表达式引擎并不真正理解"模式中的数字,除了有限重复语法(例如 a{3,5} 匹配 3 和 5 a).

          That is, - in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g. a{3,5} matches between 3 and 5 a).

          范围定义改为使用字符的 ASCII/Unicode 编码来定义范围.字符 0 在 ASCII 中编码为十进制 48;9 是 57.因此,字符定义 [0-9] 包括编码中值在十进制 48 和 57 之间的所有字符.更明智的是,按照设计,这些字符是 01、...、9.

          Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character 0 is encoded in ASCII as decimal 48; 9 is 57. Thus, the character definition [0-9] includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters 0, 1, ..., 9.

          再来看看另一个常见的字符类定义[a-zA-Z]

          Let's take a look at another common character class definition [a-zA-Z]

          在 ASCII 中:

          • A = 65, Z = 90
          • a = 97, z = 122
          • A = 65, Z = 90
          • a = 97, z = 122

          这意味着:

          • [a-zA-Z][A-Za-z] 是等价的
          • 在大多数风格中,[a-Z] 可能是非法字符范围
            • 因为a (97) 比Z (90)大于"
            • [a-zA-Z] and [A-Za-z] are equivalent
            • In most flavors, [a-Z] is likely to be an illegal character range
              • because a (97) is "greater than" than Z (90)
              • [ (91), \ (92), ] (93), ^ (94), _ (95), ` (96)
              • [ (91), \ (92), ] (93), ^ (94), _ (95), ` (96)

              这篇关于为什么 [01-12] 范围没有按预期工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆