icu4j西里尔语到拉丁语 [英] icu4j cyrillic to latin

查看:139
本文介绍了icu4j西里尔语到拉丁语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力让西里尔语成为拉丁语,所以我可以将它们放在网址中。我使用 icu4j 音译,但它仍然会给出这样奇怪的字符:Vil'ândimaa。它应该更像是 viljandimaa 。当我复制那个网址时,这些字母会变成%..一些无用的东西。



有人知道如何用icu4j将西里尔文转换为az吗?



更新



无法自己回答,但发现这个问题非常有帮助:将符号,重音符号转换为英文字母

解决方案

修改标识符来做你想要的。您可以使用带有删除转换的正则表达式删除不需要的字符。



例如,考虑字符串'Eéматематика

 'E\\\é \\\м\\\а\\\т\\\е\\\м \ u0430\\\т\\\и\\\к\\\а

标识符Any-Latin; NFD; [^ \\p {Alnum}]删除将音译为拉丁语(可能仍包含重音符号),将重音字符分解为字母和变音符号并删除任何不是字母数字的东西。结果字符串是Eematematika



您可以在一般变换






<例子:

  // import com.ibm.icu.text.Transliterator; 
String greek
='E\\\é \\\м\\\а\\\т\\\е\\\м\\\а\\\т\\\и\\\к\\\а;
String id =Any-Latin; NFD; [^ \\p {Alnum}]删除;
String latin = Transliterator.getInstance(id)
.transform(greek);
System.out.println(拉丁语);

针对ICU4J 49.1进行测试。


I'm trying to get Cyrillic words to be in latin so I can have them in urls. I use icu4j transliterator, but it still gives weird characters like this: Vilʹândimaa. It should be more like viljandimaa. When I copy that url these letters turn to %.. something useless.

Does anybody know how to get Cyrillic to a-z with icu4j?

UPDATE

Can't answer myself already but found this question that was very helpful: Converting Symbols, Accent Letters to English Alphabet

解决方案

Modify your identifier to do what you want. You can strip unwanted characters using a regular expression with the Remove transform.

For example, consider the string "'Eé математика":

"'E\u00E9 \u043c\u0430\u0442\u0435\u043c\u0430\u0442\u0438\u043a\u0430"

The identifier "Any-Latin; NFD; [^\\p{Alnum}] Remove" will transliterate to Latin (which may still include accents), decompose accented characters into the letter and diacritics and remove anything that isn't an alphanumeric. The resultant string is "Eematematika".

You can read more on the identifiers under General Transforms on the ICU website.


Example:

//import com.ibm.icu.text.Transliterator;
String greek
       = "'E\u00E9 \u043c\u0430\u0442\u0435\u043c\u0430\u0442\u0438\u043a\u0430";
String id = "Any-Latin; NFD; [^\\p{Alnum}] Remove";
String latin = Transliterator.getInstance(id)
                             .transform(greek);
System.out.println(latin);

Tested against ICU4J 49.1.

这篇关于icu4j西里尔语到拉丁语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆