Python 3 - 编码/解码与字节/Str [英] Python 3 - Encode/Decode vs Bytes/Str

查看:53
本文介绍了Python 3 - 编码/解码与字节/Str的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 python3 的新手,来自 python2,我对 unicode 基础有点困惑.我读过一些不错的帖子,这让一切变得更加清晰,但是我看到 python 3 上有 2 种方法可以处理编码和解码,但我不确定要使用哪一种.

因此,python 3 中的想法是,每个字符串都是 unicode,并且可以以字节为单位进行编码和存储,或者再次解码回 unicode 字符串.

但是有两种方法可以做到:
u'something'.encode('utf-8') 将生成 b'something',但是 bytes(u'something', 'utf-8').
b'bytes'.decode('utf-8') 似乎与 str(b'bytes', 'utf-8') 做同样的事情.

现在我的问题是,为什么有 2 种方法似乎做同样的事情,或者比另一种更好(为什么?)我一直试图在谷歌上找到答案,但没有运气.

<预><代码>>>>original = '27岁少妇生孩子后变老'>>>类型(原始)<类'str'>>>>编码 = original.encode('utf-8')>>>打印(编码)b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'>>>类型(编码)<类'字节'>>>>编码 2 = 字节(原始,'utf-8')>>>打印(编码2)b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'>>>类型(编码2)<类'字节'>>>>打印(编码+编码2)b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x8127\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\xexe5\x8f\x98\xe8\x80\x81'>>>解码 = 编码.解码('utf-8')>>>打印(解码)27岁少妇生孩子后变老>>>解码2 = str(编码2,'utf-8')>>>打印(解码2)27岁少妇生孩子后变老>>>类型(解码)<类'str'>>>>类型(解码2)<类'str'>>>>打印(str(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81', 'utf-8'))27岁少妇生孩子后变老>>>打印(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'.decode('utf-8'))27岁少妇生孩子后变老

解决方案

两者都不比另一个更好,它们做的事情完全一样.但是,使用 .encode().decode() 是更常用的方法.它也与 Python 2 兼容.

I am new to python3, coming from python2, and I am a bit confused with unicode fundamentals. I've read some good posts, that made it all much clearer, however I see there are 2 methods on python 3, that handle encoding and decoding, and I'm not sure which one to use.

So the idea in python 3 is, that every string is unicode, and can be encoded and stored in bytes, or decoded back into unicode string again.

But there are 2 ways to do it:
u'something'.encode('utf-8') will generate b'something', but so does bytes(u'something', 'utf-8').
And b'bytes'.decode('utf-8') seems to do the same thing as str(b'bytes', 'utf-8').

Now my question is, why are there 2 methods that seem to do the same thing, and is either better than the other (and why?) I've been trying to find answer to this on google, but no luck.

>>> original = '27岁少妇生孩子后变老'
>>> type(original)
<class 'str'>
>>> encoded = original.encode('utf-8')
>>> print(encoded)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> type(encoded)
<class 'bytes'>
>>> encoded2 = bytes(original, 'utf-8')
>>> print(encoded2)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> type(encoded2)
<class 'bytes'>
>>> print(encoded+encoded2)
b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x8127\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'
>>> decoded = encoded.decode('utf-8')
>>> print(decoded)
27岁少妇生孩子后变老
>>> decoded2 = str(encoded2, 'utf-8')
>>> print(decoded2)
27岁少妇生孩子后变老
>>> type(decoded)
<class 'str'>
>>> type(decoded2)
<class 'str'>
>>> print(str(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81', 'utf-8'))
27岁少妇生孩子后变老
>>> print(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'.decode('utf-8'))
27岁少妇生孩子后变老

解决方案

Neither is better than the other, they do exactly the same thing. However, using .encode() and .decode() is the more common way to do it. It is also compatible with Python 2.

这篇关于Python 3 - 编码/解码与字节/Str的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆