Java正则表达式支持Unicode? [英] Java regex for support Unicode?
问题描述
要匹配A到Z,我们将使用正则表达式:
To match A to Z, we will use regex:
[A-Za-z]
[A-Za-z]
如何让正则表达式匹配用户输入的utf8字符?例如中文单词如环保部
How to allow regex to match utf8 characters entered by user? For example Chinese words like 环保部
推荐答案
您正在寻找的是Unicode属性。
What you are looking for are Unicode properties.
例如 \p {L}
是任何语言的任何一种信件
e.g. \p{L}
is any kind of letter from any language
所以匹配这样一个中文的正则表达式单词可能类似于
So a regex to match such a Chinese word could be something like
\p{L}+
有许多这样的属性,有关详细信息,请参阅 regular-expressions.info
There are many such properties, for more details see regular-expressions.info
另一个选择是使用修饰符
Another option is to use the modifier
Pattern.UNICODE_CHARACTER_CLASS
在Java 7中有一个新属性 Pattern.UNICODE_CHARACTER_CLASS
启用Unicode版本的预定义字符类在此处查看我的答案以获取更多详细信息和链接
In Java 7 there is a new property Pattern.UNICODE_CHARACTER_CLASS
that enables the Unicode version of the predefined character classes see my answer here for some more details and links
你可以这样做
Pattern p = Pattern.compile("\\w+", Pattern.UNICODE_CHARACTER_CLASS);
和 \w
将匹配所有字母和任何语言的所有数字(当然还有一些单词组合字符,如 _
)。
and \w
would match all letters and all digits from any languages (and of course some word combining characters like _
).
这篇关于Java正则表达式支持Unicode?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!