在Python中将JIS X 208代码转换为UTF-8 [英] Convert JIS X 208 code to UTF-8 in Python
问题描述
假设我有这个汉字亜",它在JIS X 208代码中以十六进制形式表示:0x3021.我希望我的Python程序将代码转换为E4BA9C的UTF-8格式,以便像这样
Let's say I have this Kanji "亜" which is represented in JIS X 208 code in hex form: 0x3021. I want my Python program to convert that code into its UTF-8 form E4BA9C so that I can pass that string (URL-encoded) into my url like this
http://jisho.org/api/v1/search/words?keyword =%E4%BA%9C
我正在使用Python 2.7.12,但是我也对Python 3解决方案持开放态度
I'm using Python 2.7.12 but I'm open to Python 3 solution as well
推荐答案
可以通过ISO 2022编解码器进行访问.
These are accessed under ISO 2022 codec.
>>> '亜'.encode('iso2022_jp')
b'\x1b$B0!\x1b(B'
如果我看到这些字节没有被转义序列限制,那么我将不得不知道正在使用哪个版本的JIS X 0208,但是无论如何我现在还是完全在Wikipedia上进行模式匹配.
If I saw those bytes not framed by the escape sequence, I would have to know which version of JIS X 0208 is being used, but I'm entirely pattern matching on Wikipedia at this point anyway.
>>> b = b'\033$B' + bytes.fromhex('3021')
>>> c = b.decode('iso2022_jp')
>>> c
'亜'
>>> urllib.parse.quote(c)
'%E4%BA%9C'
(这是Python 3.)
(This is Python 3.)
这篇关于在Python中将JIS X 208代码转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!