Python中的Base64编码问题 [英] Base64 encoding issue in Python

查看:230
本文介绍了Python中的Base64编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在python中保存一个params文件,并且此params文件包含一些我不会在纯文本上保留的参数,因此我将整个文件编码为base64(我知道这不是最安全的编码方式世界,但它适用于我需要使用的那种数据.

I need to save a params file in python and this params file contains some parameters that I won't leave on plain text, so I codify the entire file to base64 (I know that this isn't the most secure encoding of the world but it works for the kind of data that I need to use).

有了编码,一切都可以正常工作.我编码文件的内容(带有适当扩展名的简单txt)并保存文件.问题在于解码.我打印保存文件之前编码的文本和保存文件中编码的文本,它们完全相同,但是由于我不知道的原因,对保存文件的文本进行解码会返回此错误UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 1: invalid start byte并且在保存文件之前对文本进行解码效果很好.

With the encoding, everything works well. I encode the content of my file (a simply txt with a proper extension) and save the file. The problem comes with the decode. I print the text coded before save the file and the text coded from the file saved and there are exactly the same, but for a reason I don't know, the decode of the text of the file saved returns me this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 1: invalid start byte and the decode of the text before save the file works well.

有解决此问题的主意吗?

Any idea to resolve this issue?

这是我的代码,我尝试将所有内容转换为字节,字符串以及所有内容...

This is my code, I have tried converting all to bytes, to string, and everything...

params = open('params.bpr','r').read()


paramsencoded = base64.b64encode(bytes(params,'utf-8'))

print(paramsencoded)

paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')

newparams = open('paramsencoded.bpr','w+',encoding='utf-8')
newparams.write(str(paramsencoded))
newparams.close()

params2 = open('paramsencoded.bpr',encoding='utf-8').read()
print(params2)

paramsdecoded = str(base64.b64decode(str(paramsencoded,'utf-8')),'utf-8')

paramsdecoded = base64.b64decode(str(params2))

print(str(paramsdecoded,'utf-8'))

推荐答案

您的错误在于处理base64.b64encode()返回的bytes对象,您在该对象上调用了str():

Your error lies in your handling of the bytes object returned by base64.b64encode(), you called str() on the object:

newparams.write(str(paramsencoded))

不解码 bytes对象:

>>> bytesvalue = b'abc='
>>> str(bytesvalue)
"b'abc='"

请注意b'...'表示法.您生成了bytes对象的 representation ,这是一个包含Python语法的字符串,该字符串可以为调试目的复制该值(您可以复制该字符串值并将其粘贴到Python中以重新创建相同的bytes值).

Note the b'...' notation. You produced the representation of the bytes object, which is a string containing Python syntax that can reproduce the value for debugging purposes (you can copy that string value and paste it into Python to re-create the same bytes value).

乍一看可能不太容易,因为base64.b64encode()否则只会生成带有可打印ASCII字节的输出.

This may not be that easy to notice at first, as base64.b64encode() otherwise only produces output with printable ASCII bytes.

但是您的解码问题是从那里开始的,因为从文件中读取的值解码时,开始时会包含b'字符.前两个字符被解释为Base64数据 b是有效的Base64字符,而'被解析器忽略:

But your decoding problem originates from there, because when decoding the value read back from the file includes the b' characters at the start. Those first two characters are interpreted as Base64 data too; the b is a valid Base64 character, and the ' is ignored by the parser:

>>> bytesvalue = b'hello world'
>>> base64.b64encode(bytesvalue)
b'aGVsbG8gd29ybGQ='
>>> str(base64.b64encode(bytesvalue))
"b'aGVsbG8gd29ybGQ='"
>>> base64.b64decode(str(base64.b64encode(bytesvalue)))  # with str()
b'm\xa1\x95\xb1\xb1\xbc\x81\xdd\xbd\xc9\xb1\x90'
>>> base64.b64decode(base64.b64encode(bytesvalue))       # without str()
b'hello world'

请注意输出结果完全不同,因为Base64解码现在从错误的位置开始,因为b是第一个字节的前6位(使第一个解码的字节成为6C,6D,6E或6F字节,因此mnop ASCII).

Note how the output is completely different, because the Base64 decoding is now starting from the wrong place, as b is the first 6 bits of the first byte (making the first decoded byte a 6C, 6D, 6E or 6F bytes, so m,n, o or p ASCII).

您可以正确地解码该值(使用paramsencoded.decode('ascii')str(paramsencoded, 'ascii')),但是您不应将任何这些数据都视为文本.

You could properly decode the value (using paramsencoded.decode('ascii') or str(paramsencoded, 'ascii')) but you should't treat any of this data as text.

相反,请以 binary模式打开文件.然后使用bytes对象进行读写操作,并且base64.b64encode()base64.b64decode()函数也可以在bytes上进行操作,从而实现完美的匹配:

Instead, open your files in binary mode. Reading and writing then operates with bytes objects, and the base64.b64encode() and base64.b64decode() functions also operate on bytes, making for a perfect match:

with open('params.bpr', 'rb') as params_source:
    params = params_source.read()  # bytes object

params_encoded = base64.b64encode(params)
print(params_encoded.decode('ascii'))   # base64 data is always ASCII data

params_decoded = base64.b64decode(params_encoded)

with open('paramsencoded.bpr', 'wb') as new_params:
    newparams.write(params_encoded)  # write binary data

with open('paramsencoded.bpr', 'rb') as new_params:
    params_written = new_params.read()

print(params_written.decode('ascii'))  # still Base64 data, so decode as ASCII

params_decoded = base64.b64decode(params_written)  # decode the bytes value

print(params_decoded.decode('utf8'))  # assuming the original source was UTF-8

我明确使用bytes.decode(codec)而不是str(..., codec)以避免意外的str(...)调用.

I explicitly use bytes.decode(codec) rather than str(..., codec) to avoid accidental str(...) calls.

这篇关于Python中的Base64编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆