为什么这个正则表达式不适用于东方阿拉伯数字? [英] Why this regex does not work with Eastern Arabic numerals?

查看:111
本文介绍了为什么这个正则表达式不适用于东方阿拉伯数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

@ thg435 将此答案写入javascript 问题

 > ; a =foo 1234567890 bbb 123456
foo 1234567890 bbb 123456
> a.replace(/ \d(?= \d\d(\d {3})* \ b)/ g,[$&])
foo 1 [2 ] 34 [5] 67 [8] 90 bbb [1] 23 [4] 56

It适用于印度 - 阿拉伯数字;即1,2,3,4,......但是当我尝试将正则表达式应用于东方阿拉伯数字时,它失败了。这是我使用的正则表达式(我刚刚用 [\\\0600-\\\ u0669] \d >):

  / [\\\0660-\\\ u0669](?= [\\\0660-\\\ u0669] [ \ u0660-\\\٩]([\ u0660-\ u0669] {3})* \ b)/ g 

如果我的字符串是 1234foo ,它实际上有效,但是当它是 1234 foo 时失败甚至 foo1234

 > a =1234foo 1234 foo foo1234
1234foo 1234 foo foo1234
> a.replace(/ [\\\٠-\\\٩](?= [\\\٠-\\\٩] [\\\٠-\\\٩]([\\\٠-\\\٩] {3})* \b)/ g,[$&])
1 [2] 34foo 1234 foo foo1234

对我来说真正重要的是分开的数字(例如 1234 )。为什么它不能匹配分隔的数字?



更新:



另一项要求是正则表达式只应匹配5位或更多位数(例如12345而不是1234)。我最初认为这就像在表达式末尾添加 {5,} 一样简单,但这不起作用。

解决方案

奇怪的是,我遇到了与你相反的行为(第一个不起作用而另外两个起作用),但如果你更换<$怎么样? c $ c> \ b (?![\\\0660-\ u0669])?然后它似乎无论在它之前还是之后都有效:

  [\\\0660-\ u0669](?= [ \ u0660-\\\٩] [\ u0660-\ u0669]([\\\0660-\ u0669] {3})*(?![\\\0660-\ u0669]))

编辑:这似乎适用于新要求 - 仅添加括号数字长度为3位数或更长:

  [\ u0660-\ u0669](?= [\\\0660] -\\\٩] {2}([\\\٠-\\\٩] {3})+)|(小于([\\\٠-\\\٩]!); = [\\\٠-\ u0669] {2})[\ u0660-\ u0669](?= [\\\0660-\ u0669] {2}(?![\\\0660-\ u0669]))

顺便说一句,一些Regex处理器会将这些数字视为 \d 。这是第二个带有 \d 的正则表达式,而不是那些应该更容易阅读的字符范围:



< ?pre> \d(?= \d {2}(\d {3})+(?\d))|(小于= \d { 2})\d(?= \d {2}(?!\ d))


@thg435 wrote this answer to a javascript question:

> a = "foo 1234567890 bbb 123456"
"foo 1234567890 bbb 123456"
> a.replace(/\d(?=\d\d(\d{3})*\b)/g, "[$&]")
"foo 1[2]34[5]67[8]90 bbb [1]23[4]56"

It works well with Hindu-Arabic numerals; i.e. 1,2,3,4,... . But when I try to apply the regex to Eastern Arabic numerals, it fails. Here is the regex I use (I've just replaced \d with [\u0660-\u0669] ):

/[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*\b)/g

It actually works if my string is ١٢٣٤foo, but fails when it's ١٢٣٤ foo or even foo١٢٣٤:

> a = "١٢٣٤foo  ١٢٣٤ foo  foo١٢٣٤"
"١٢٣٤foo  ١٢٣٤ foo  foo١٢٣٤"
> a.replace(/[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*\b)/g, "[$&]")
"١[٢]٣٤foo  ١٢٣٤ foo  foo١٢٣٤"

What actually matters to me are separated numbers (e.g. ١٢٣٤). Why it cannot match separated numbers?

Update:

Another requirement is that the regex should only match numbers with 5 or more digits (e.g. ١٢٣٤٥ and not ١٢٣٤). I initially thought that that's as simple as adding {5,} at the end of the expression, but that doesn't work.

解决方案

Oddly, I'm experiencing the opposite behavior from you (the first one doesn't work and the other two do), but how about if you replaced the \b with (?![\u0660-\u0669])? Then it seems to work no matter what's before or after it:

[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*(?![\u0660-\u0669]))

Edit: This seems to work for the new requirement - to only add the brackets if the run of digits is 3 digits long or more:

[\u0660-\u0669](?=[\u0660-\u0669]{2}([\u0660-\u0669]{3})+(?![\u0660-\u0669]))|(?<=[\u0660-\u0669]{2})[\u0660-\u0669](?=[\u0660-\u0669]{2}(?![\u0660-\u0669]))

Incidentally, some Regex processors will treat those digits as a match for \d. Here is that second Regex with \d instead of those character ranges, which should be a little easier to read:

\d(?=\d{2}(\d{3})+(?!\d))|(?<=\d{2})\d(?=\d{2}(?!\d))

这篇关于为什么这个正则表达式不适用于东方阿拉伯数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆