如何在ColdFusion或Java regex中匹配拉丁unicode字符? [英] How do I match latin unicode characters in ColdFusion or Java regex?
问题描述
我正在寻找只匹配数字[0-9],字母[az],但包括无ASCII 葡萄牙语字母的ColdFusion或Java regex(在替换函数中使用) (unicode latin,如ç
和ã
)。
I'm looking for a ColdFusion or Java regex (to use in a replace function) that will only match numbers [0-9], letters [a-z], but include none ASCII Portuguese letters (unicode latin, like ç
and ã
).
一些像这样:
str = reReplaceNoCase(str, "match none number/letter but keep unicode latin chars", "", "ALL");
输入字符串:informação123?:#$%
所需结果:informação123
匹配字母和数字与 [az] [0-9]
,但这不匹配字母如ç
和ã
。
I know I can match letters and numbers with [a-z][0-9]
, but this doesn't match letters such as ç
and ã
.
推荐答案
尝试字母数字字符类: \ w
,它应该匹配字母,数字和下划线。
Try alphanumeric character class: \w
, it should match letters, digits, and underscores.
也可以使用特殊的命名类 \p {L}
(我不知道,Java RegEx解析器支持它)。
所以在C#中你的任务可以使用下面的代码:
Also you can use special named class \p{L}
(I don't know, does Java RegEx parser support it).
So in C# your task can be done using following code:
var input = "informação 123 ?:#$%";
var result = Regex.Replace(input, @"[^\p{L}\s0-9]", string.Empty);
Regex [^ \p {L} \s0-9]
表示:此类别中的任何字符不(所有字母,空格,数字)。因此,在您的示例?中匹配:#$%
,我们可以用空字符串替换这些字符。
Regex [^\p{L}\s0-9]
means: any character not in this class (all letters, white space, digits). Thereby it matches in your example ?:#$%
and we can replace these characters with empty string.
这篇关于如何在ColdFusion或Java regex中匹配拉丁unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!