如何在 ColdFusion 或 Java 正则表达式中匹配拉丁 unicode 字符? [英] How do I match latin unicode characters in ColdFusion or Java regex?

查看:22
本文介绍了如何在 ColdFusion 或 Java 正则表达式中匹配拉丁 unicode 字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个 ColdFusion 或 Java 正则表达式(用于替换函数),它只匹配数字 [0-9]、字母 [az],但不包含 ASCII 葡萄牙语 字母(unicode latin,如 çã).

I'm looking for a ColdFusion or Java regex (to use in a replace function) that will only match numbers [0-9], letters [a-z], but include none ASCII Portuguese letters (unicode latin, like ç and ã).

有些是这样的:

str = reReplaceNoCase(str, "match none number/letter but keep unicode latin chars", "", "ALL");

输入字符串:informação 123 ?:#$%"
期望的结果:informação 123"

我知道我可以用 [az][0-9] 匹配字母和数字,但这不匹配 çã 等字母.

I know I can match letters and numbers with [a-z][0-9], but this doesn't match letters such as ç and ã.

推荐答案

试试字母数字字符类:w,它应该匹配字母、数字和下划线.

Try alphanumeric character class: w, it should match letters, digits, and underscores.

您也可以使用特殊的命名类 p{L}(我不知道,Java RegEx 解析器是否支持它).因此,在 C# 中,您的任务可以使用以下代码完成:

Also you can use special named class p{L} (I don't know, does Java RegEx parser support it). So in C# your task can be done using following code:

var input = "informação 123 ?:#$%";
var result = Regex.Replace(input, @"[^p{L}s0-9]", string.Empty);

Regex [^p{L}s0-9] 表示:该类中的任何字符not(所有字母、空格、数字).因此它在您的示例 ?:#$% 中匹配,我们可以用空字符串替换这些字符.

Regex [^p{L}s0-9] means: any character not in this class (all letters, white space, digits). Thereby it matches in your example ?:#$% and we can replace these characters with empty string.

这篇关于如何在 ColdFusion 或 Java 正则表达式中匹配拉丁 unicode 字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆