如何从Java中的®,©,™等字符串中删除高位ASCII字符 [英] How to remove high-ASCII characters from string like ®, ©, ™ in Java

查看:123
本文介绍了如何从Java中的®,©,™等字符串中删除高位ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从Java中的String中检测并删除高级ASCII字符,如®,©,™。是否有任何开源库可以做到这一点?

I want to detect and remove high-ASCII characters like ®, ©, ™ from a String in Java. Is there any open-source library that can do this?

推荐答案

如果你需要删除所有非US-ASCII(即在0x0-0x7F之外的字符,您可以这样做:

If you need to remove all non-US-ASCII (i.e. outside 0x0-0x7F) characters, you can do something like this:

s = s.replaceAll("[^\\x00-\\x7f]", "");

如果你需要过滤很多字符串,最好使用预编译模式:

If you need to filter many strings, it would be better to use a precompiled pattern:

private static final Pattern nonASCII = Pattern.compile("[^\\x00-\\x7f]");
...
s = nonASCII.matcher(s).replaceAll();

如果它真的对性能至关重要,也许Alex Nikolaenkov的建议会更好。

And if it's really performance-critical, perhaps Alex Nikolaenkov's suggestion would be better.

这篇关于如何从Java中的®,©,™等字符串中删除高位ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆