正则表达式允许使用非ASCII和外国字母? [英] Regex to allow non-ascii and foreign letters?

查看:30
本文介绍了正则表达式允许使用非ASCII和外国字母?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以创建一个正则表达式以允许非ascii字母和拉丁字母一起使用,例如中文或希腊符号(例如允许的汉语AbN汉语)?

Is it possible to create a regular expression to allow non-ascii letters along with Latin alphabets, for example Chinese or Greek symbols(eg. A汉语AbN漢語 allowed)?

我目前有以下 ^ [\ w \ d] [\ w \ d _ \-\.\ s] * $ ,仅允许拉丁字母.

I currently have the following ^[\w\d][\w\d_\-\.\s]*$ which only allows Latin alphabets.

推荐答案

在.NET中,

^[\p{L}\d_][\p{L}\d_.\s-]*$

等效于您的正则表达式,另外还允许其他Unicode字母.

is equivalent to your regex, additionally allowing other Unicode letters.

说明:

\ p {L} 是Unicode属性"Letter"的简写.

\p{L} is a shorthand for the Unicode property "Letter".

注意事项:我认为,您不想将下划线用作初始字符(仅在第二个字符类中才显示下划线).由于 \ w 包含下划线,因此您的正则表达式确实允许使用下划线.您可能希望将其从解决方案中的第一个字符类中删除(当然,它不是 \ p {L} 中包含的 ).

Caveat: I think you wanted to not allow the underscore as initial character (evidenced by its presence only in the second character class). Since \w includes the underscore, your regex did allow it, though. You might want to remove it from the first character class in my solution (it's not included in \p{L}, of course).

在ECMAScript中,事情并不是那么容易.您将必须定义自己的Unicode字符范围.幸运的是,StackOverflow的一位资深用户已经开始尝试并设计了一个JavaScript regex转换器:

In ECMAScript, things are not so easy. You would have to define your own Unicode character ranges. Fortunately, a fellow StackOverflow user has already risen to the occasion and designed a JavaScript regex converter:

https://stackoverflow.com/a/8933546/20670

这篇关于正则表达式允许使用非ASCII和外国字母?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆