如何在JS中检测非罗马字符? [英] How to detect non-roman characters in JS?
问题描述
如何检测字符串中的非罗马字符?请注意,这不像对范围A-Z和0-9之外的所有字符进行分类那样简单.罗马字符有很多变体,例如德语ä,ö,ü-仍然是罗马词,另一方面,中文"显然不是罗马字母.
How can I detect non-roman characters in a string? Mind you, it's not as simple as classing all characters outside of the scope A-Z and 0-9. There are lots of variations on roman characters like the German ä,ö,ü - which are still roman, "中文" on the other hand, is clearly not roman script.
推荐答案
JavaScript本身是Unicode,各种脚本的字符范围在 http://www.unicode.org/charts/
JavaScript is natively Unicode and the character ranges for various scripts are well documented at http://www.unicode.org/charts/
您将看到有几个与拉丁(罗马)脚本相对应的块.其中最常见的是在0080–00FF范围内的高ASCII范围,称为Latin-1补充.这将包括您提到的德语字符.
You will see that there are several blocks that correspond to Latin (Roman) scripts. The most common of these is the high ASCII range known as Latin-1 supplement in the range 0080–00FF. This will include the German characters you mention.
JavaScript使我们可以使用正则表达式很好地测试Unicode范围.因此,您可以按照以下示例在多个字符串中检测拉丁1补码字符:
JavaScript lets us test for Unicode ranges nicely using Regular expressions. So you could detect Latin 1 supplement characters in several strings as per this example:
var en = 'Coffee',
fr = 'Café',
el = 'Καφές';
console.log( en.replace( /[\u0080-\u00FF]/g, '*') );
console.log( fr.replace( /[\u0080-\u00FF]/g, '*') );
console.log( el.replace( /[\u0080-\u00FF]/g, '*') );
这将打印出来:
Coffee
Caf*
Καφές
因为根据我们的字符范围,只有重音的é
与拉丁语的补充范围匹配(因此将其替换为*
)
Because according to our character ranges only the accented é
matches the latin supplement range (hence it is replaced with *
)
因此,为了更好地回答您的问题,以发现非罗马"字符,您可以这样做:
So to better answer your question, to detect "non-roman" characters you could do:
var str = 'a ä ö ü 中 文',
reg = /[^\u0000-\u024F\u1E00-\u1EFF\u2C60-\u2C7F\uA720-\uA7FF]/g;
console.log( str.replace( reg, '?') );
哪个会显示:
a ä ö ü ? ?
您可以使用这些范围来执行您特别需要的任何操作.我将这个简单的工具放在一起,用于从unicode块中构建正则表达式,但是我很确定更好的资源,
You can use these ranges to do whatever it is you specifically need. I put together this crude tool for building regex from unicode blocks, but I'm quite sure there are better resources out there,
这篇关于如何在JS中检测非罗马字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!