在Python中将JIS X 208代码转换为UTF-8 [英] Convert JIS X 208 code to UTF-8 in Python

查看:71
本文介绍了在Python中将JIS X 208代码转换为UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这个汉字亜",它在JIS X 208代码中以十六进制形式表示:0x3021.我希望我的Python程序将代码转换为E4BA9C的UTF-8格式,以便像这样

Let's say I have this Kanji "亜" which is represented in JIS X 208 code in hex form: 0x3021. I want my Python program to convert that code into its UTF-8 form E4BA9C so that I can pass that string (URL-encoded) into my url like this

http://jisho.org/api/v1/search/words?keyword =%E4%BA%9C

我正在使用Python 2.7.12,但是我也对Python 3解决方案持开放态度

I'm using Python 2.7.12 but I'm open to Python 3 solution as well

推荐答案

可以通过ISO 2022编解码器进行访问.

These are accessed under ISO 2022 codec.

>>> '亜'.encode('iso2022_jp')
b'\x1b$B0!\x1b(B'

如果我看到这些字节没有被转义序列限制,那么我将不得不知道正在使用哪个版本的JIS X 0208,但是无论如何我现在还是完全在Wikipedia上进行模式匹配.

If I saw those bytes not framed by the escape sequence, I would have to know which version of JIS X 0208 is being used, but I'm entirely pattern matching on Wikipedia at this point anyway.

>>> b = b'\033$B' + bytes.fromhex('3021')
>>> c = b.decode('iso2022_jp')
>>> c
'亜'
>>> urllib.parse.quote(c)
'%E4%BA%9C'

(这是Python 3.)

(This is Python 3.)

这篇关于在Python中将JIS X 208代码转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆