将符号,重音符号转换为英文字母 [英] Converting Symbols, Accent Letters to English Alphabet

查看:120
本文介绍了将符号,重音符号转换为英文字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题在于,正如您所知,中有数千个字符Unicode图表我希望将所有相似的字符转换为英文字母的字母。

The problem is that, as you know, there are thousands of characters in the Unicode chart and I want to convert all the similar characters to the letters which are in English alphabet.

例如,这里有一些转换:

For instance here are a few conversions:

ҥ->H
Ѷ->V
Ȳ->Y
Ǭ->O
Ƈ->C
tђє Ŧค๓เℓy --> the Family
...

我看到有超过20个字母的版本A / A。我不知道如何对它们进行分类。它们看起来像大海捞针。

and I saw that there are more than 20 versions of letter A/a. and I don't know how to classify them. They look like needles in the haystack.

unicode字符的完整列表位于 http://www.ssec.wisc.edu/~tomw/java/unicode.html http://unicode.org/charts/charindex.html 。只需向下滚动即可看到字母的变化。

The complete list of unicode chars is at http://www.ssec.wisc.edu/~tomw/java/unicode.html or http://unicode.org/charts/charindex.html . Just try scrolling down and see the variations of letters.

如何用Java转换所有这些?请帮助我:(

How can I convert all these with Java? Please help me :(

推荐答案

如何从.NET中的字符串中删除变音符号(重音符号)?

此方法在java 中工作正常(纯粹是为了删除变音符号也称为重音符号)

它基本上将所有重音字符转换为deAccented对应字符,然后将它们组合成变音符号。现在你可以使用正则表达式去除变音符号。

It basically converts all accented characters into their deAccented counterparts followed by their combining diacritics. Now you can use a regex to strip off the diacritics.

import java.text.Normalizer;
import java.util.regex.Pattern;

public String deAccent(String str) {
    String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); 
    Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
    return pattern.matcher(nfdNormalizedString).replaceAll("");
}

这篇关于将符号,重音符号转换为英文字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆