如何从字符串中删除特殊字符? [英] How to remove special characters from a string?

查看:117
本文介绍了如何从字符串中删除特殊字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想删除特殊字符,例如:

I want to remove special characters like:

- + ^ . : ,

来自使用Java的字符串。

from an String using Java.

推荐答案

这取决于您定义为特殊字符的内容,但请尝试 replaceAll(...)

That depends on what you define as special characters, but try replaceAll(...):

String result = yourString.replaceAll("[-+.^:,]","");

请注意 ^ 字符不得为列表中的第一个,因为你要么必须逃避它,要么意味着除了这些字符之外。

Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".

另一个注意事项: - 字符需要是列表中的第一个或最后一个,否则你必须转义它或者它将定义一个范围(例如: - ,表示范围内的所有字符:)。

Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).

因此,为了保持一致性而不依赖于字符定位,您可能希望转义所有在正则表达式中具有特殊含义的字符(以下列表不完整,因此,知道其他字符,如 { $ 等。):

So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):

String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");



如果你想摆脱所有的标点符号离子和符号,试试这个正则表达式: \p {P} \p {S} (请记住,在Java字符串中你必须逃避反斜杠: \\\\ {P} \\\\ {S})。


If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").

第三个方式可能是这样的,如果你可以准确定义字符串中应该保留的内容:

A third way could be something like this, if you can exactly define what should be left in your string:

String  result = yourString.replaceAll("[^\\w\\s]","");

这意味着:替换不是单词字符的所有内容(在任何情况下为az,0-9或_)或空白。

This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.

编辑:请注意,还有其他一些模式可能会有所帮助。但是,我无法解释所有这些,所以请查看 regular-expressions.info 的参考部分。 。

please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.

正如雷建议的那样,这里的定义允许字符方法的限制性较小的替代方案:

Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:

String  result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");

正则表达式匹配任何语言中不是字母的所有内容,而不是分隔符(空格,换行符等) )。请注意,您不能使用 [\P {L} \P {Z}] (大写字母P表示没有该属性),因为这意味着所有不是字母或空格的东西,几乎匹配所有内容,因为字母不是空格,反之亦然。

The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.

有关Unicode的其他信息

一些unicode字符似乎会因为编码它们的不同方式(作为单个代码点或代码点组合)而导致问题。有关更多信息,请参阅 regular-expressions.info

Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.

这篇关于如何从字符串中删除特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆