Python中的UTF-8编码 [英] UTF-8 coding in Python
问题描述
我想使用replace方法将其转换为UTF-8字符,但是我无法获得正确的编码。
这是一个代码示例:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
r ='_ea_b4_80'
r2 ='\xea\xb4\x80'
r = r.replace('_','\\x')
打印r
打印r.encode(utf-8)
打印r2
在这个例子中,r与r2不一样;这是一个输出。
\xea\xb4\x80
\xea\xb4\ x80
관< - 正确显示
可能是什么错?
\x
只对字符串文字有意义,重新无法使用替换
添加它。
要获得所需的结果,转换为字节,然后解码:
import binascii
r ='_ea_b4_80'
rhexonly = r .replace('_','')#返回'eab480'
rbytes = binascii.unhexlify(rhexonly)#返回b'\xea\xb4\x80'
rtext = rbytes.decode ('utf-8')#返回'관'(如果Py2,str Py3,unicode)
print(rtext)
如果您愿意,您应该获得관
。
如果您使用现代Py3,你可以避免导入(假设 r
实际上是一个 str
; 字节。 fromhex
,与 binascii.hexlify
不同,只采取 str
输入,而不是$ code>字节输入)使用 bytes.fromhex
类方法 binascii.unhexlify
:
rbytes = bytes.fromhex(rhexonly )#返回b'\xea\xb4\x80'
I have an UTF-8 character encoded with `_' in between, e.g., '_ea_b4_80'. I'm trying to convert it into UTF-8 character using replace method, but I can't get the correct encoding.
This is a code example:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
r = '_ea_b4_80'
r2 = '\xea\xb4\x80'
r = r.replace('_', '\\x')
print r
print r.encode("utf-8")
print r2
In this example, r is not the same as r2; this is an output.
\xea\xb4\x80
\xea\xb4\x80
관 <-- correctly shown
What might be wrong?
\x
is only meaningful in string literals, you're can't use replace
to add it.
To get your desired result, convert to bytes, then decode:
import binascii
r = '_ea_b4_80'
rhexonly = r.replace('_', '') # Returns 'eab480'
rbytes = binascii.unhexlify(rhexonly) # Returns b'\xea\xb4\x80'
rtext = rbytes.decode('utf-8') # Returns '관' (unicode if Py2, str Py3)
print(rtext)
which should get you 관
as you desire.
If you're using modern Py3, you can avoid the import (assuming r
is in fact a str
; bytes.fromhex
, unlike binascii.hexlify
, only take str
inputs, not bytes
inputs) using the bytes.fromhex
class method in place of binascii.unhexlify
:
rbytes = bytes.fromhex(rhexonly) # Returns b'\xea\xb4\x80'
这篇关于Python中的UTF-8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!