如何在JS中检测非罗马字符? [英] How to detect non-roman characters in JS?

查看:99
本文介绍了如何在JS中检测非罗马字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何检测字符串中的非罗马字符?请注意,这不像对范围A-Z和0-9之外的所有字符进行分类那样简单.罗马字符有很多变体,例如德语ä,ö,ü-仍然是罗马词,另一方面,中文"显然不是罗马字母.

How can I detect non-roman characters in a string? Mind you, it's not as simple as classing all characters outside of the scope A-Z and 0-9. There are lots of variations on roman characters like the German ä,ö,ü - which are still roman, "中文" on the other hand, is clearly not roman script.

推荐答案

JavaScript本身是Unicode,各种脚本的字符范围在 http://www.unicode.org/charts/

JavaScript is natively Unicode and the character ranges for various scripts are well documented at http://www.unicode.org/charts/

您将看到有几个与拉丁(罗马)脚本相对应的块.其中最常见的是在0080–00FF范围内的高ASCII范围,称为Latin-1补充.这将包括您提到的德语字符.

You will see that there are several blocks that correspond to Latin (Roman) scripts. The most common of these is the high ASCII range known as Latin-1 supplement in the range 0080–00FF. This will include the German characters you mention.

JavaScript使我们可以使用正则表达式很好地测试Unicode范围.因此,您可以按照以下示例在多个字符串中检测拉丁1补码字符:

JavaScript lets us test for Unicode ranges nicely using Regular expressions. So you could detect Latin 1 supplement characters in several strings as per this example:

var en = 'Coffee',
    fr = 'Café',
    el = 'Καφές';

console.log( en.replace( /[\u0080-\u00FF]/g, '*') );
console.log( fr.replace( /[\u0080-\u00FF]/g, '*') );
console.log( el.replace( /[\u0080-\u00FF]/g, '*') );

这将打印出来:

Coffee
Caf*
Καφές

因为根据我们的字符范围,只有重音的é与拉丁语的补充范围匹配(因此将其替换为*)

Because according to our character ranges only the accented é matches the latin supplement range (hence it is replaced with *)

因此,为了更好地回答您的问题,以发现非罗马"字符,您可以这样做:

So to better answer your question, to detect "non-roman" characters you could do:

var str = 'a ä ö ü 中 文',
    reg = /[^\u0000-\u024F\u1E00-\u1EFF\u2C60-\u2C7F\uA720-\uA7FF]/g;

console.log( str.replace( reg, '?') );

哪个会显示:

a ä ö ü ? ?

您可以使用这些范围来执行您特别需要的任何操作.我将这个简单的工具放在一起,用于从unicode块中构建正则表达式,但是我很确定更好的资源,

You can use these ranges to do whatever it is you specifically need. I put together this crude tool for building regex from unicode blocks, but I'm quite sure there are better resources out there,

这篇关于如何在JS中检测非罗马字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆