删除所有非“字词”从Java中的String,留下重音字符? [英] Remove all non-"word characters" from a String in Java, leaving accented characters?

查看:105
本文介绍了删除所有非“字词”从Java中的String,留下重音字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用正则表达式时,显然Java的正则表达式将变音符号和其他特殊字符计为非单词字符。

Apparently Java's Regex flavor counts Umlauts and other special characters as non-"word characters" when I use Regex.

        "TESTÜTEST".replaceAll( "\\W", "" )

返回TESTTEST我。我想要的只是删除所有真正的非单词字符。任何方式都可以做到这一点,而不是像

returns "TESTTEST" for me. What I want is for only all truly non-"word characters" to be removed. Any way to do this without having something along the lines of

         "[^A-Za-z0-9äöüÄÖÜßéèáàúùóò]"

只是意识到我忘了ô?

推荐答案

使用 [^ \p {L} \p {Nd}] + - 这匹配所有(Unicode)字符字母也不是(十进制)数字。

Use [^\p{L}\p{Nd}]+ - this matches all (Unicode) characters that are neither letters nor (decimal) digits.

在Java中:

String resultString = subjectString.replaceAll("[^\\p{L}\\p{Nd}]+", "");

修改:

我将 \p {N} 更改为 \p {Nd} 因为前者也匹配某些数字符号,如¼;后者没有。请在 regex101.com 上查看。

I changed \p{N} to \p{Nd} because the former also matches some number symbols like ¼; the latter doesn't. See it on regex101.com.

这篇关于删除所有非“字词”从Java中的String,留下重音字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆