如何仅将有效的罗马数字与正则表达式匹配? [英] How do you match only valid roman numerals with a regular expression?
问题描述
考虑我的另一个问题,我决定我什至无法创建一个匹配罗马数字的正则表达式(更不用说生成它们的上下文无关语法了)
Thinking about my other problem, i decided I can't even create a regular expression that will match roman numerals (let alone a context-free grammar that will generate them)
问题是只匹配有效的罗马数字.例如,990 不是XM",而是CMXC"
The problem is matching only valid roman numerals. Eg, 990 is NOT "XM", it's "CMXC"
我为此制作正则表达式的问题是,为了允许或不允许某些字符,我需要回顾一下.例如,让我们以数千和数百为例.
My problem in making the regex for this is that in order to allow or not allow certain characters, I need to look back. Let's take thousands and hundreds, for example.
我可以允许 M{0,2}C?M(允许 900、1000、1900、2000、2900 和 3000).但是,如果匹配在 CM 上,我不能允许后面的字符是 C 或 D(因为我已经 900 了).
I can allow M{0,2}C?M (to allow for 900, 1000, 1900, 2000, 2900 and 3000). However, If the match is on CM, I can't allow following characters to be C or D (because I'm already at 900).
如何在正则表达式中表达?
如果它在正则表达式中根本无法表达,那么它是否可以在上下文无关文法中表达?
How can I express this in a regex?
If it's simply not expressible in a regex, is it expressible in a context-free grammar?
推荐答案
您可以为此使用以下正则表达式:
You can use the following regex for this:
^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$
分解一下,M{0,4}
指定了千位部分,基本上将其限制在0
和4000
之间.这是一个相对简单的:
Breaking it down, M{0,4}
specifies the thousands section and basically restrains it to between 0
and 4000
. It's a relatively simple:
0: <empty> matched by M{0}
1000: M matched by M{1}
2000: MM matched by M{2}
3000: MMM matched by M{3}
4000: MMMM matched by M{4}
当然,如果您想允许更大的数字,您当然可以使用 M*
之类的东西来允许任何数(包括零)千.
You could, of course, use something like M*
to allow any number (including zero) of thousands, if you want to allow bigger numbers.
接下来是(CM|CD|D?C{0,3})
,稍微复杂一点,这个是针对百节的,涵盖了所有的可能性:
Next is (CM|CD|D?C{0,3})
, slightly more complex, this is for the hundreds section and covers all the possibilities:
0: <empty> matched by D?C{0} (with D not there)
100: C matched by D?C{1} (with D not there)
200: CC matched by D?C{2} (with D not there)
300: CCC matched by D?C{3} (with D not there)
400: CD matched by CD
500: D matched by D?C{0} (with D there)
600: DC matched by D?C{1} (with D there)
700: DCC matched by D?C{2} (with D there)
800: DCCC matched by D?C{3} (with D there)
900: CM matched by CM
第三,(XC|XL|L?X{0,3})
遵循与上一节相同的规则,但为十位:
Thirdly, (XC|XL|L?X{0,3})
follows the same rules as previous section but for the tens place:
0: <empty> matched by L?X{0} (with L not there)
10: X matched by L?X{1} (with L not there)
20: XX matched by L?X{2} (with L not there)
30: XXX matched by L?X{3} (with L not there)
40: XL matched by XL
50: L matched by L?X{0} (with L there)
60: LX matched by L?X{1} (with L there)
70: LXX matched by L?X{2} (with L there)
80: LXXX matched by L?X{3} (with L there)
90: XC matched by XC
最后,(IX|IV|V?I{0,3})
是单位部分,处理 0
到 9
> 并且也与前两节类似(罗马数字虽然看起来很奇怪,但一旦你弄清楚它们是什么,就遵循一些逻辑规则):
And, finally, (IX|IV|V?I{0,3})
is the units section, handling 0
through 9
and also similar to the previous two sections (Roman numerals, despite their seeming weirdness, follow some logical rules once you figure out what they are):
0: <empty> matched by V?I{0} (with V not there)
1: I matched by V?I{1} (with V not there)
2: II matched by V?I{2} (with V not there)
3: III matched by V?I{3} (with V not there)
4: IV matched by IV
5: V matched by V?I{0} (with V there)
6: VI matched by V?I{1} (with V there)
7: VII matched by V?I{2} (with V there)
8: VIII matched by V?I{3} (with V there)
9: IX matched by IX
<小时>
请记住,该正则表达式也将匹配一个空字符串.如果您不想要这个(并且您的正则表达式引擎足够现代),您可以使用正向后视和前瞻:
Just keep in mind that that regex will also match an empty string. If you don't want this (and your regex engine is modern enough), you can use positive look-behind and look-ahead:
(?<=^)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=$)
(另一种选择是预先检查长度是否为零).
(the other alternative being to just check that the length is not zero beforehand).
这篇关于如何仅将有效的罗马数字与正则表达式匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!