转换汉字为汉语拼音 [英] Convert chinese characters to hanyu pinyin

查看:92
本文介绍了转换汉字为汉语拼音的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将 汉字 转换为 汉语拼音?

How to convert from chinese characters to hanyu pinyin?

例如

你->Nǐ

马->Mǎ

更多信息:

可以接受汉字拼音的重音符号或数字形式,数字形式是我的首选.

Either accents or numerical forms of hanyu pinyin are acceptable, the numerical form being my preference.

首选Java库,但是也可以将其他语言的库放入包装器中.

A Java library is preferred, however, a library in another language that can be put in a wrapper is also OK.

我希望任何曾 个人使用 这样的图书馆的人从质量/可靠性方面对其进行推荐或评论.

I would like anyone who has personally used such a library before to recommend or comment on it, in terms of its quality/ reliabilitty.

推荐答案

将汉字转换为拼音的问题相当困难.根据上下文,有许多汉字字符具有多个拼音表示形式.比较长大(拼音:zhang da)和长城(拼音:chang cheng).因此,单字符转换通常实际上是无用的,除非您的系统输出多种可能性.还有分词的问题,它也会影响拼音的表示形式.尽管也许您已经知道这一点,但我认为说这一点很重要.

The problem of converting hanzi to pinyin is a fairly difficult one. There are many hanzi characters which have multiple pinyin representations, depending on context. Compare 长大 (pinyin: zhang da) to 长城 (pinyin: chang cheng). For this reason, single-character conversion is often actually useless, unless you have a system that outputs multiple possibilities. There is also the issue of word segmentation, which can affect the pinyin representation as well. Though perhaps you already knew this, I thought it was important to say this.

也就是说, Adso软件包既包含分段器又包含概率性的拼音注释器,基于出色的Adso库.虽然要花一些时间才能习惯,但可能会比您要查找的要大得多(我过去发现它太笨拙了,无法满足我的需求).此外,似乎在任何地方都没有公共API,它的C ++ ...

That said, the Adso Package contains both a segmenter and a probabilistic pinyin annotator, based on the excellent Adso library. It takes a while to get used to though, and may be much larger than you are looking for (I have found in the past that it was a bit too bulky for my needs). Additionally, there doesn't appear to be a public API anywhere, and its C++ ...

对于一个最近的项目,因为我正在使用地名,所以我只使用了Google Translate API(特别是非官方的Java端口,该端口至少对于普通名词来说,通常可以很好地翻译成拼音.)问题是常用的替代音译系统,例如"HongKong"(应为"XiangGang").鉴于所有这些,Google Translate相当有限,但它提供了一个开始.我之前没有听说过pinyin4j,但之后刚玩完它,我发现它不是最佳选择-虽然它会输出潜在的候选拼音罗马化列表,但它并没有尝试统计地确定其可能性,虽然有一种方法可以返回单个表示形式,但它会很快将被淘汰,因为它目前仅返回第一个罗马化,而不是最有可能的.该程序似乎做得很好的地方是在罗马化和常规可配置性之间进行转换.

For a recent project, because I was working with place names, I simply used the Google Translate API (specifically, the unofficial java port, which, for common nouns at least, usually does a good job of translating to pinyin. The problem is commonly-used alternative transliteration systems, such as "HongKong" for what should be "XiangGang". Given all of this, Google Translate is pretty limited, but it offers a start. I hadn't heard of pinyin4j before, but after playing with it just now, I have found that it is less than optimal--while it outputs a list of potential candidate pinyin romanizations it makes no attempt to statistically determine their likelihood. There is a method to return a single representation, but it will soon be phased out, as it currently only returns the first romanization, not the most likely. Where the program seems to do well is with conversion between romanizations and general configurability.

那么,简而言之,答案可能是其中之一,具体取决于您的需求.异质专有名词?谷歌翻译.需要统计吗?阿佐愿意接受没有上下文信息的候选人名单吗?拼音4j.

In short then, the answer may be either any one of these, depending on what you need. Idiosyncratic proper nouns? Google Translate. In need of statistics? Adso. Willing to accept candidate lists without context information? Pinyin4j.

这篇关于转换汉字为汉语拼音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆