如何检测Java字符串中的unicode字符? [英] How do I detect unicode characters in a Java string?

查看:1363
本文介绍了如何检测Java字符串中的unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个包含Ü的字符串。我怎样才能找到所有那些unicode字符?我应该测试他们的代码吗?我该怎么做?

Suppose I have a string that contains Ü. How would I find all those unicode characters? Should I test for their code? How would I do that?

例如,给定字符串AÜXÜ,我想将其转换为AYXY。我想对其他unicode角色做同样的事情,我不想将它们存储在某种翻译地图中。

For example, given the string "AÜXÜ", I'd like to transform it to "AYXY". I'd like to do the same for other unicode characters, and I would hate to have to store them in a translation map of some sort.

推荐答案

unicode characters的定义含糊不清,但将被视为标准未涵盖的UTF-8字符 ISO 8859 charset 。如果在您的情况下这是真的,则循环遍历String中的所有字符并测试其代码点以确定它是否在给定的字符集内。

The definition of "unicode characters" is vague, but will be taken to mean UTF-8 characters not covered by the standard ISO 8859 charset. If this is true in your case, then loop through all characters in the String and test its codepoint to determine whether it is within the given character set.

或者,使用 Map< Character,Character> 和地图中包含与键匹配的字符。例如:

Alternatively, use a Map<Character, Character> and characters in the map that contain match the keys. For example:

Map<Character, Character> charReplacementMap = new HashMap<Character, Character>() {{
    put('Ü', 'Y');
    // Put more here.
}};

String originalString = "AÜAÜ";
StringBuilder builder = new StringBuilder();

for (char currentChar : originalString.toCharArray()) {
    Character replacementChar = charReplacementMap.get(currentChar);
    builder.append(replacementChar != null ? replacementChar : currentChar);
}

String newString = builder.toString();

或者,你的意思是所有带变音符号的人物?若然,请使用 java .text.Normalizer 删除变音符号:

Or, do you mean "all characters with diacritics"? If so, then use java.text.Normalizer to remove diacritical marks:

/**
 * Remove any diacritical marks (accents like ç, ñ, é, etc) from
 * the given string (so that it returns plain c, n, e, etc).
 * @param string The string to remove diacritical marks from.
 * @return The string with removed diacritical marks, if any.
 */
public static String removeDiacriticalMarks(String string) {
    return Normalizer.normalize(string, Form.NFD)
        .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}

一个陷阱,Ü将成为U,而不是Y.不确定这是不是你在追求。如果你想用发音字符替换,你真的需要创建一个映射。当然,这是一项繁琐的工作,但它的完成时间比你需要的时间少。

One pitfall, Ü would become U, not Y. Not sure if that's what you're after. If you want to replace by pronounced character, you'll really need to create a mapping. Sure, it's a tedious work, but it's done in less time than you needed to follow this topic.

这篇关于如何检测Java字符串中的unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆