Python中的UTF-8编码 [英] UTF-8 coding in Python

查看：491 发布时间：2017/8/17 1:38:59 python unicode encoding utf-8

本文介绍了Python中的UTF-8编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个用_编码的UTF-8字符，例如_ea_b4_80。
我想使用replace方法将其转换为UTF-8字符，但是我无法获得正确的编码。

这是一个代码示例：

  import sys 
 reload（sys）
 sys.setdefaultencoding（'utf8'）
 
r ='_ea_b4_80'
 r2 ='\xea\xb4\x80'
 
r = r.replace（'_'，'\\x'）
打印r 
打印r.encode（utf-8）
打印r2

在这个例子中，r与r2不一样;这是一个输出。

  \xea\xb4\x80 
 \xea\xb4\ x80 
관<  - 正确显示

可能是什么错？

解决方案

\x 只对字符串文字有意义，重新无法使用替换添加它。

要获得所需的结果，转换为字节，然后解码：

  import binascii 
 
r ='_ea_b4_80'
 
 rhexonly = r .replace（'_'，''）＃返回'eab480'
 rbytes = binascii.unhexlify（rhexonly）＃返回b'\xea\xb4\x80'
 rtext = rbytes.decode （'utf-8'）＃返回'관'（如果Py2，str Py3，unicode）
 print（rtext）

如果您愿意，您应该获得관。

如果您使用现代Py3，你可以避免导入（假设 r 实际上是一个 str ; 字节。 fromhex ，与 binascii.hexlify 不同，只采取 str 输入，而不是$ code>字节输入）使用 bytes.fromhex 类方法 binascii.unhexlify ：

  rbytes = bytes.fromhex（rhexonly ）＃返回b'\xea\xb4\x80'

I have an UTF-8 character encoded with `_' in between, e.g., '_ea_b4_80'. I'm trying to convert it into UTF-8 character using replace method, but I can't get the correct encoding.

This is a code example:

import sys
reload(sys)  
sys.setdefaultencoding('utf8')

r = '_ea_b4_80'
r2 = '\xea\xb4\x80'

r = r.replace('_', '\\x')
print r
print r.encode("utf-8")
print r2

In this example, r is not the same as r2; this is an output.

\xea\xb4\x80
\xea\xb4\x80
관  <-- correctly shown

What might be wrong?

解决方案

\x is only meaningful in string literals, you're can't use replace to add it.

To get your desired result, convert to bytes, then decode:

import binascii

r = '_ea_b4_80'

rhexonly = r.replace('_', '')          # Returns 'eab480'
rbytes = binascii.unhexlify(rhexonly)  # Returns b'\xea\xb4\x80'
rtext = rbytes.decode('utf-8')         # Returns '관' (unicode if Py2, str Py3)
print(rtext)

which should get you 관 as you desire.

If you're using modern Py3, you can avoid the import (assuming r is in fact a str; bytes.fromhex, unlike binascii.hexlify, only take str inputs, not bytes inputs) using the bytes.fromhex class method in place of binascii.unhexlify:

rbytes = bytes.fromhex(rhexonly)  # Returns b'\xea\xb4\x80'

这篇关于Python中的UTF-8编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python中的UTF-8编码 [英] UTF-8 coding in Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python中的UTF-8编码 [英] UTF-8 coding in Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭