转换编号带声调的拼音拼音 [英] Convert numbered pinyin to pinyin with tone marks
问题描述
是否有任何脚本,库,或使用程序的Python
或 BASH
工具(例如 AWK
, perl的
, SED
),它可以正确地转换编号的拼音(如dian4 nao3)为UTF-8汉语拼音标记(如厂甸nǎo)?
Are there any scripts, libraries, or programs using Python
, or BASH
tools (e.g. awk
, perl
, sed
) which can correctly convert numbered pinyin (e.g. dian4 nao3) to UTF-8 pinyin with tone marks (e.g. diàn nǎo)?
我发现下面的例子,但它们需要 PHP
或 #C
:
I have found the following examples, but they require PHP
or #C
:
- 编号以突出拼音?
- #C <一个href=\"http://stackoverflow.com/questions/6159239/any-libraries-to-convert-number-pinyin-to-pinyin-with-tone-markings\">Any库转换数字汉语拼音标识拼音?
- PHP Convert numbered to accentuated Pinyin?
- #C Any libraries to convert number pinyin to pinyin with tone markings?
我还发现各种在线工具,但他们不能处理大量的转换。
I have also found various On-line tools, but they cannot handle a large number of conversions.
推荐答案
我有一些Python 3 code,这是否和它足够小,只是把直接在这里的答案。
I've got some Python 3 code that does this, and it's small enough to just put directly in the answer here.
PinyinToneMark = {
0: "aoeiuv\u00fc",
1: "\u0101\u014d\u0113\u012b\u016b\u01d6\u01d6",
2: "\u00e1\u00f3\u00e9\u00ed\u00fa\u01d8\u01d8",
3: "\u01ce\u01d2\u011b\u01d0\u01d4\u01da\u01da",
4: "\u00e0\u00f2\u00e8\u00ec\u00f9\u01dc\u01dc",
}
def decode_pinyin(s):
s = s.lower()
r = ""
t = ""
for c in s:
if c >= 'a' and c <= 'z':
t += c
elif c == ':':
assert t[-1] == 'u'
t = t[:-1] + "\u00fc"
else:
if c >= '0' and c <= '5':
tone = int(c) % 5
if tone != 0:
m = re.search("[aoeiuv\u00fc]+", t)
if m is None:
t += c
elif len(m.group(0)) == 1:
t = t[:m.start(0)] + PinyinToneMark[tone][PinyinToneMark[0].index(m.group(0))] + t[m.end(0):]
else:
if 'a' in t:
t = t.replace("a", PinyinToneMark[tone][0])
elif 'o' in t:
t = t.replace("o", PinyinToneMark[tone][1])
elif 'e' in t:
t = t.replace("e", PinyinToneMark[tone][2])
elif t.endswith("ui"):
t = t.replace("i", PinyinToneMark[tone][3])
elif t.endswith("iu"):
t = t.replace("u", PinyinToneMark[tone][4])
else:
t += "!"
r += t
t = ""
r += t
return r
该手柄 U
, U:
和 v
,所有这一切我已经遇到过。将需要的Python 2兼容性细微的修改。
This handles ü
, u:
, and v
, all of which I've encountered. Minor modifications will be needed for Python 2 compatibility.
这篇关于转换编号带声调的拼音拼音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!