将符号,重音符号转换为英文字母 [英] Converting Symbols, Accent Letters to English Alphabet
问题描述
问题在于,正如您所知,中有数千个字符Unicode图表我希望将所有相似的字符转换为英文字母的字母。
The problem is that, as you know, there are thousands of characters in the Unicode chart and I want to convert all the similar characters to the letters which are in English alphabet.
例如,这里有一些转换:
For instance here are a few conversions:
ҥ->H
Ѷ->V
Ȳ->Y
Ǭ->O
Ƈ->C
tђє Ŧค๓เℓy --> the Family
...
我看到有超过20个字母的版本A / A。我不知道如何对它们进行分类。它们看起来像大海捞针。
and I saw that there are more than 20 versions of letter A/a. and I don't know how to classify them. They look like needles in the haystack.
unicode字符的完整列表位于 http://www.ssec.wisc.edu/~tomw/java/unicode.html 或 http://unicode.org/charts/charindex.html 。只需向下滚动即可看到字母的变化。
The complete list of unicode chars is at http://www.ssec.wisc.edu/~tomw/java/unicode.html or http://unicode.org/charts/charindex.html . Just try scrolling down and see the variations of letters.
如何用Java转换所有这些?请帮助我:(
How can I convert all these with Java? Please help me :(
推荐答案
此方法在java 中工作正常(纯粹是为了删除变音符号也称为重音符号)。
它基本上将所有重音字符转换为deAccented对应字符,然后将它们组合成变音符号。现在你可以使用正则表达式去除变音符号。
It basically converts all accented characters into their deAccented counterparts followed by their combining diacritics. Now you can use a regex to strip off the diacritics.
import java.text.Normalizer;
import java.util.regex.Pattern;
public String deAccent(String str) {
String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
return pattern.matcher(nfdNormalizedString).replaceAll("");
}
这篇关于将符号,重音符号转换为英文字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!