如何仅将有效的罗马数字与正则表达式匹配? [英] How do you match only valid roman numerals with a regular expression?

查看:22
本文介绍了如何仅将有效的罗马数字与正则表达式匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑我的另一个问题,我决定我什至无法创建一个匹配罗马数字的正则表达式(更不用说生成它们的上下文无关语法了)

Thinking about my other problem, i decided I can't even create a regular expression that will match roman numerals (let alone a context-free grammar that will generate them)

问题是只匹配有效的罗马数字.例如,990 不是XM",而是CMXC"

The problem is matching only valid roman numerals. Eg, 990 is NOT "XM", it's "CMXC"

我为此制作正则表达式的问题是,为了允许或不允许某些字符,我需要回顾一下.例如,让我们以数千和数百为例.

My problem in making the regex for this is that in order to allow or not allow certain characters, I need to look back. Let's take thousands and hundreds, for example.

我可以允许 M{0,2}C?M(允许 900、1000、1900、2000、2900 和 3000).但是,如果匹配在 CM 上,我不能允许后面的字符是 C 或 D(因为我已经 900 了).

I can allow M{0,2}C?M (to allow for 900, 1000, 1900, 2000, 2900 and 3000). However, If the match is on CM, I can't allow following characters to be C or D (because I'm already at 900).

如何在正则表达式中表达?
如果它在正则表达式中根本无法表达,那么它是否可以在上下文无关文法中表达?

How can I express this in a regex?
If it's simply not expressible in a regex, is it expressible in a context-free grammar?

推荐答案

您可以为此使用以下正则表达式:

You can use the following regex for this:

^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$

分解一下,M{0,4} 指定了千位部分,基本上将其限制在04000 之间.这是一个相对简单的:

Breaking it down, M{0,4} specifies the thousands section and basically restrains it to between 0 and 4000. It's a relatively simple:

   0: <empty>  matched by M{0}
1000: M        matched by M{1}
2000: MM       matched by M{2}
3000: MMM      matched by M{3}
4000: MMMM     matched by M{4}

当然,如果您想允许更大的数字,您当然可以使用 M* 之类的东西来允许任何数(包括零)千.

You could, of course, use something like M* to allow any number (including zero) of thousands, if you want to allow bigger numbers.

接下来是(CM|CD|D?C{0,3}),稍微复杂一点,这个是针对百节的,涵盖了所有的可能性:

Next is (CM|CD|D?C{0,3}), slightly more complex, this is for the hundreds section and covers all the possibilities:

  0: <empty>  matched by D?C{0} (with D not there)
100: C        matched by D?C{1} (with D not there)
200: CC       matched by D?C{2} (with D not there)
300: CCC      matched by D?C{3} (with D not there)
400: CD       matched by CD
500: D        matched by D?C{0} (with D there)
600: DC       matched by D?C{1} (with D there)
700: DCC      matched by D?C{2} (with D there)
800: DCCC     matched by D?C{3} (with D there)
900: CM       matched by CM

第三,(XC|XL|L?X{0,3}) 遵循与上一节相同的规则,但为十位:

Thirdly, (XC|XL|L?X{0,3}) follows the same rules as previous section but for the tens place:

 0: <empty>  matched by L?X{0} (with L not there)
10: X        matched by L?X{1} (with L not there)
20: XX       matched by L?X{2} (with L not there)
30: XXX      matched by L?X{3} (with L not there)
40: XL       matched by XL
50: L        matched by L?X{0} (with L there)
60: LX       matched by L?X{1} (with L there)
70: LXX      matched by L?X{2} (with L there)
80: LXXX     matched by L?X{3} (with L there)
90: XC       matched by XC

最后,(IX|IV|V?I{0,3}) 是单位部分,处理 09> 并且也与前两节类似(罗马数字虽然看起来很奇怪,但一旦你弄清楚它们是什么,就遵循一些逻辑规则):

And, finally, (IX|IV|V?I{0,3}) is the units section, handling 0 through 9 and also similar to the previous two sections (Roman numerals, despite their seeming weirdness, follow some logical rules once you figure out what they are):

0: <empty>  matched by V?I{0} (with V not there)
1: I        matched by V?I{1} (with V not there)
2: II       matched by V?I{2} (with V not there)
3: III      matched by V?I{3} (with V not there)
4: IV       matched by IV
5: V        matched by V?I{0} (with V there)
6: VI       matched by V?I{1} (with V there)
7: VII      matched by V?I{2} (with V there)
8: VIII     matched by V?I{3} (with V there)
9: IX       matched by IX

<小时>

请记住,该正则表达式也将匹配一个空字符串.如果您不想要这个(并且您的正则表达式引擎足够现代),您可以使用正向后视和前瞻:


Just keep in mind that that regex will also match an empty string. If you don't want this (and your regex engine is modern enough), you can use positive look-behind and look-ahead:

(?<=^)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=$)

(另一种选择是预先检查长度是否为零).

(the other alternative being to just check that the length is not zero beforehand).

这篇关于如何仅将有效的罗马数字与正则表达式匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆