正则表达式匹配阿拉伯关键字 [英] Regex match Arabic keyword

查看:301
本文介绍了正则表达式匹配阿拉伯关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的正则表达式,可以在文本中找到一些单词:

I have simple regex which founds some word in text:

var patern = new RegExp("\bsomething\b", "gi");

此匹配的单词在文本中带有空格或标点符号.

This match word in text with spaces or punctuation around.

因此匹配:

I have something.

但不匹配:

I havesomething.

什么都好,正是我所需要的.

what is fine and exactly what I need.

但是我对阿拉伯语有疑问.如果我有正则表达式:

But I have issue with for example Arabic language. If I have regex:

var patern = new RegExp("\bرياضة\b", "gi");

和文字:

رياضة أنا أحب رياضتي وأنا سعيد حقا هنا لها حبي 

我要寻找的关键字在文本的结尾.

The keyword which I am looking for is at the end of the text.

但这不起作用,只是找不到它.

But this doesn't work, it just doesn't find it.

如果我从正则表达式中删除\b,它会起作用:

It works if I remove \b from regex:

var patern = new RegExp("رياضة", "gi");

但这就是我想要的,因为如果它是另一个单词的一部分,例如上面的英语示例,我不想找到它:

But that is now what I want, because I don't want to find it if it's part of another word like in english example above:

 I havesomething.

因此,我对正则表达式的了解真的很少,是否有人可以帮助我使用英语和阿拉伯语等语言.

So I really have low knowledge about regex and if anyone can help me to work this with english and languages like arabic.

推荐答案

我们首先要了解\b是什么意思:

We have first to understand what does \b mean:

\ b是在称为单词边界"的位置匹配的锚.

\b is an anchor that matches at a position that is called a "word boundary".

在您的情况下,您要查找的边界单词没有其他阿拉伯字母.

In your case, the word boundaries that you are looking for are not having other Arabic letters.

要只匹配正则表达式中的阿拉伯字母,我们使用unicode:

To match only Arabic letters in Regex, we use unicode:

[\u0621-\u064A]+

或者我们可以直接使用阿拉伯字母

Or we can simply use Arabic letters directly

[ء-ي]+

上面的代码将匹配所有阿拉伯字母.要从中划出一个单词边界,我们可以简单地在两侧将其反转:

The code above will match any Arabic letters. To make a word boundary out of it, we could simply reverse it on both sides:

[^ء-ي]ARABIC TEXT[^ء-ي]

上面的代码表示:请不要在您所用的阿拉伯语单词的两边匹配任何阿拉伯字符.

The code above means: don't match any Arabic characters on either sides of an Arabic word which will work in your case.

考虑这个示例,您给了我们,我对此做了一些修改:

Consider this example that you gave us which I modified a little bit:

 أنا أحب رياضتي رياض رياضة رياضيات وأنا سعيد حقا هنا 

如果我们仅尝试匹配رياض,则该单词将使我们的搜索也匹配رياضةرياضياترياضتي.但是,如果我们添加上面的代码,则匹配将仅成功在رياض上.

If we are trying to match only رياض, this word will make our search match also رياضة, رياضيات, and رياضتي. However, if we add the code above, the match will successfully be on رياض only.

var x = " أنا أحب رياضتي رياض رياضة رياضيات وأنا سعيد حقا هنا ";
x = x.replace(/([^ء-ي]رياض[^ء-ي])/g, '<span style="color:red">$1</span>');
document.write (x);

如果您想用一个代码来解释أآإا,则可以使用类似[\u0622\u0623\u0625\u0627]的名称,也可以将它们全部列在方括号[أآإا]之间.这是完整的代码

If you would like to account for أآإا with one code, you could use something like this [\u0622\u0623\u0625\u0627] or simply list them all between square brackets [أآإا]. Here is a complete code

var x = "أنا هنا وانا هناك .. آنا هنا وإنا هناك";
x = x.replace(/([أآإا]نا)/g, '<span style="color:red">$1</span>');
document.write (x);

注意:如果要匹配正则表达式中所有可能的阿拉伯字符,包括所有阿拉伯字母أ ب ت ث ج,所有变音符号َ ً ُ ٌ ِ ٍ ّ和所有阿拉伯数字١٢٣٤٥٦٧٨٩٠,请使用此正则表达式:[،-٩]+

Note: If you want to match every possible Arabic characters in Regex including all Arabic letters أ ب ت ث ج, all diacritics َ ً ُ ٌ ِ ٍ ّ, and all Arabic numbers ١٢٣٤٥٦٧٨٩٠, use this regex: [،-٩]+

关于Unicode中阿拉伯字符排名的有用链接: https://en.wikipedia.org/wiki/Arabic_script_in_Unicode

Useful link about the ranking of Arabic characters in Unicode: https://en.wikipedia.org/wiki/Arabic_script_in_Unicode

这篇关于正则表达式匹配阿拉伯关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆