Javascript - 正则表达式删除特殊字符,但也保留希腊字符 [英] Javascript - regex to remove special characters but also keep greek characters

查看:186
本文介绍了Javascript - 正则表达式删除特殊字符,但也保留希腊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从一段文本中删除特殊字符,但使用以下正则表达式

I am trying to remove special characters from a piece of text, but using the following regular expression

var desired = stringToReplace.replace(/[^\w\s]/gi, '')

(找到这里:
javascript regexp删除所有特殊字符

具有删除希腊字符的负面影响,这是我不想要的。

has the negative effect that deletes greek characters and this is something I don't want.

有人还可以解释我如何在正则表达式中使用字符范围吗?有没有可以帮我定义我想要的范围的角色地图?

Can someone also explain me how to use character ranges in regular expressions? Is there a character map which can help me define the range I want?

答案:

[a-zA-Z0-9ΆΈ-ώ\s]   # See my 2nd comment under Joeytje50's answer.


推荐答案

这些范围的定义方式基于其特征码。所以,因为 A 的字符代码 65 ,而 z 有字符代码 122 ,以下正则表达式:

The way these ranges are defined is based on their character code. So, since A has char code 65, and z has char code 122, the following regex:

[A-z]

将匹配每个字母,但每个字符都包含在这些字符代码之间的字符代码,即带有代码91到95,它们是字符 [\] ^ _ 。 ( 演示 )。

would match every letter, but also every character with char codes that fall between those char codes, namely those with codes 91 through 95, which would be the characters [\]^_. (demo).

现在,对于希腊字母,对于alpha到omega,大写字符的字符代码是913-937,对于alpha到omega,小写字符是945-969(这包括sigma的小写变体,即ς(962)和σ(963))。

Now, for Greek letters, the character codes for the uppercase characters are 913-937 for alpha through omega, and the lowercase characters are 945-969 for alpha through omega (this includes both lowercase variants of sigma, namely ς (962) and σ (963)).

所以,为了匹配除拉丁字母,希腊字母和阿拉伯数字之外的每个字符,您需要以下正则表达式:

So, to match every character except for latin letters, greek letters, and arabic numerals, you need the following regex:

[a-zA-Z0-9α-ωΑ-Ω]

因此,对于希腊字符,它就像拉丁语一样工作字母。

So, for greek characters, it works just like latin letters.

编辑:我测试这是通过谷歌翻译的Lipsum,看起来这不会考虑重音字母。我已经检查了这些重音字母的字符代码是什么,结果发现它们放在小写字母之前,或者紧跟在大写字母之后。因此,以下正则表达式适用于所有希腊字母,包括带重音的字母:

I've tested this via a Google Translate'd Lipsum, and it looks like this doesn't take accented letters into account. I've checked what the character codes for these accented letters were, and it turns out they are placed right before the lowercase letters, or right after the uppercase letters. So, the following regex works for all greek letters, including accented ones:

[a-zA-Z0-9ά-ωΑ-ώ]

演示

此扩展范围现在还包括άέήίΰ(字符代码940到944)和ϊϋόύώ(代码970到974)。

This expanded range now also includes άέήίΰ (char codes 940 through 944) and ϊϋόύώ (codes 970 through 974).

还包括空格(空格,制表符,换行符),只需在范围内包含 \s

To also include whitespace (spaces, tabs, newlines), simply include a \s in the range:

[a-zA-Z0-9ά-ωΑ-ώ\s]

< a href =http://regex101.com/r/iL5yY2 =nofollow> 演示

编辑:显然有更多希腊字母需要包含在此范围内,即 [Ά-Ϋ] ,这是ά之前的字母范围,所以新的正则表达式如下所示:

Apparently there are more Greek letters that needed to be included in this range, namely those in the range [Ά-Ϋ], which is the range of letters right before the ά, so the new regex would look like this:

[a-zA-Z0-9Ά-ωΑ-ώ\s]

演示

这篇关于Javascript - 正则表达式删除特殊字符,但也保留希腊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆