转换编号带声调的拼音拼音 [英] Convert numbered pinyin to pinyin with tone marks

查看:242
本文介绍了转换编号带声调的拼音拼音的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何脚本,库,或使用程序的Python BASH 工具(例如 AWK perl的 SED ),它可以正确地转换编号的拼音(如dian4 nao3)为UTF-8汉语拼音标记(如厂甸nǎo)?

Are there any scripts, libraries, or programs using Python, or BASH tools (e.g. awk, perl, sed) which can correctly convert numbered pinyin (e.g. dian4 nao3) to UTF-8 pinyin with tone marks (e.g. diàn​ nǎo)?

我发现下面的例子,但它们需要 PHP #C

I have found the following examples, but they require PHP or #C:


  • 编号以突出拼音?

  • #C <一个href=\"http://stackoverflow.com/questions/6159239/any-libraries-to-convert-number-pinyin-to-pinyin-with-tone-markings\">Any库转换数字汉语拼音标识拼音?

  • PHP Convert numbered to accentuated Pinyin?
  • #C Any libraries to convert number pinyin to pinyin with tone markings?

我还发现各种在线工具,但他们不能处理大量的转换。

I have also found various On-line tools, but they cannot handle a large number of conversions.

推荐答案

我有一些Python 3 code,这是否和它足够小,只是把直接在这里的答案。

I've got some Python 3 code that does this, and it's small enough to just put directly in the answer here.

PinyinToneMark = {
    0: "aoeiuv\u00fc",
    1: "\u0101\u014d\u0113\u012b\u016b\u01d6\u01d6",
    2: "\u00e1\u00f3\u00e9\u00ed\u00fa\u01d8\u01d8",
    3: "\u01ce\u01d2\u011b\u01d0\u01d4\u01da\u01da",
    4: "\u00e0\u00f2\u00e8\u00ec\u00f9\u01dc\u01dc",
}

def decode_pinyin(s):
    s = s.lower()
    r = ""
    t = ""
    for c in s:
        if c >= 'a' and c <= 'z':
            t += c
        elif c == ':':
            assert t[-1] == 'u'
            t = t[:-1] + "\u00fc"
        else:
            if c >= '0' and c <= '5':
                tone = int(c) % 5
                if tone != 0:
                    m = re.search("[aoeiuv\u00fc]+", t)
                    if m is None:
                        t += c
                    elif len(m.group(0)) == 1:
                        t = t[:m.start(0)] + PinyinToneMark[tone][PinyinToneMark[0].index(m.group(0))] + t[m.end(0):]
                    else:
                        if 'a' in t:
                            t = t.replace("a", PinyinToneMark[tone][0])
                        elif 'o' in t:
                            t = t.replace("o", PinyinToneMark[tone][1])
                        elif 'e' in t:
                            t = t.replace("e", PinyinToneMark[tone][2])
                        elif t.endswith("ui"):
                            t = t.replace("i", PinyinToneMark[tone][3])
                        elif t.endswith("iu"):
                            t = t.replace("u", PinyinToneMark[tone][4])
                        else:
                            t += "!"
            r += t
            t = ""
    r += t
    return r

该手柄 U U: v ,所有这一切我已经遇到过。将需要的Python 2兼容性细微的修改。

This handles ü, u:, and v, all of which I've encountered. Minor modifications will be needed for Python 2 compatibility.

这篇关于转换编号带声调的拼音拼音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆