如何从字符串中删除特殊字符? [英] How to remove special characters from a string?
问题描述
我想删除特殊字符,例如:
I want to remove special characters like:
- + ^ . : ,
来自使用Java的字符串。
from an String using Java.
推荐答案
这取决于您定义为特殊字符的内容,但请尝试 replaceAll(...)
:
That depends on what you define as special characters, but try replaceAll(...)
:
String result = yourString.replaceAll("[-+.^:,]","");
请注意 ^
字符不得为列表中的第一个,因为你要么必须逃避它,要么意味着除了这些字符之外。
Note that the ^
character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".
另一个注意事项: -
字符需要是列表中的第一个或最后一个,否则你必须转义它或者它将定义一个范围(例如: - ,
表示范围内的所有字符:
到,
)。
Another note: the -
character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-,
would mean "all characters in the range :
to ,
).
因此,为了保持一致性而不依赖于字符定位,您可能希望转义所有在正则表达式中具有特殊含义的字符(以下列表不完整,因此,知道其他字符,如(
, {
, $
等。):
So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (
, {
, $
etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
如果你想摆脱所有的标点符号离子和符号,试试这个正则表达式: \p {P} \p {S}
(请记住,在Java字符串中你必须逃避反斜杠: \\\\ {P} \\\\ {S}
)。
If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S}
(keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}"
).
第三个方式可能是这样的,如果你可以准确定义字符串中应该保留的内容:
A third way could be something like this, if you can exactly define what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
这意味着:替换不是单词字符的所有内容(在任何情况下为az,0-9或_)或空白。
This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.
编辑:请注意,还有其他一些模式可能会有所帮助。但是,我无法解释所有这些,所以请查看 regular-expressions.info 的参考部分。 。
please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.
正如雷建议的那样,这里的定义允许字符方法的限制性较小的替代方案:
Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
正则表达式匹配任何语言中不是字母的所有内容,而不是分隔符(空格,换行符等) )。请注意,您不能使用 [\P {L} \P {Z}]
(大写字母P表示没有该属性),因为这意味着所有不是字母或空格的东西,几乎匹配所有内容,因为字母不是空格,反之亦然。
The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}]
(upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.
有关Unicode的其他信息
一些unicode字符似乎会因为编码它们的不同方式(作为单个代码点或代码点组合)而导致问题。有关更多信息,请参阅 regular-expressions.info 。
Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.
这篇关于如何从字符串中删除特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!