base64使用python 2.7编码,解码成块的文件 [英] base64 encode, decode to, from files in chunks with python 2.7

查看:159
本文介绍了base64使用python 2.7编码,解码成块的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了base64 python文档,并在SO和其他地方看到了示例,但是在将base64解码回原始二进制表示形式时仍然遇到问题.

I've read the base64 python docs and seen examples here on SO and elsewhere, but I'm still having a problem decoding base64 back to the original binary representation.

我没有任何例外,因此我认为没有填充或字符集问题.我得到的二进制文件比原始二进制文件小.

I'm not getting any exceptions, so I don't think there's a padding or character set issue. I just get a resulting binary file that's smaller than the original binary.

如果其中一个或两个步骤都出现问题,我将同时包括base64编码和解码步骤.

I'm including both the base64 encoding and decoding steps in case there's an issue with either or both steps.

代码必须与python 2.7一起运行.

The code must run with python 2.7.

下面是重现该问题的脚本.

Below are the scripts that reproduce the problem.


b64_encode.py

#!/usr/bin/env python2.7

#
# b64_encode.py - must run with python 2.7
#               - must process data in chunks to limit memory consumption
#               - base64 data must be JSON compatible, i.e.
#                 use base64 "modern" interface,
#                 not base64.encodestring() which contains linefeeds
#

import sys, base64

def write_base64_file_from_file(src_fname, b64_fname, chunk_size=8192):
    with open(src_fname, 'rb') as fin, open(b64_fname, 'w') as fout:
        while True:
            bin_data = fin.read(chunk_size)
            if not bin_data:
                break
            print 'bin %s data len: %d' % (type(bin_data), len(bin_data))
            b64_data = base64.b64encode(bin_data)
            print 'b64 %s data len: %d' % (type(b64_data), len(b64_data))
            fout.write(b64_data)

if len(sys.argv) != 2:
    print 'usage: %s <bin_fname>' % sys.argv[0]
    sys.exit()

bin_fname = sys.argv[1]
b64_fname = bin_fname + '.b64'

write_base64_file_from_file(bin_fname, b64_fname)


b64_decode.py

#!/usr/bin/env python2.7

#
# b64_decode.py - must run with python 2.7
#               - must process data in chunks to limit memory consumption
#

import os, sys, base64

def write_file_from_base64_file(b64_fname, dst_fname, chunk_size=8192):
    with open(b64_fname, 'r') as fin, open(dst_fname, 'wb') as fout:
        while True:
            b64_data = fin.read(chunk_size)
            if not b64_data:
                break
            print 'b64 %s data len: %d' % (type(b64_data), len(b64_data))
            bin_data = base64.b64decode(b64_data)
            print 'bin %s data len: %d' % (type(bin_data), len(bin_data))
            fout.write(bin_data)

if len(sys.argv) != 2:
    print 'usage: %s <b64_fname>' % sys.argv[0]
    sys.exit()

b64_fname = sys.argv[1]
bin_ext = os.path.splitext(os.path.splitext(b64_fname)[0])[1]
bin_fname = os.path.splitext(b64_fname)[0] + bin_ext

write_file_from_base64_file(b64_fname, bin_fname)


例如,我的19k图像文件输出为:


For example, my output for a 19k image file is:

$ ./b64_encode.py img.jpg
bin <type 'str'> data len: 8192
b64 <type 'str'> data len: 10924
bin <type 'str'> data len: 8192
b64 <type 'str'> data len: 10924
bin <type 'str'> data len: 2842
b64 <type 'str'> data len: 3792

$ ./b64_decode.py img.jpg.b64 
b64 <type 'str'> data len: 8192
bin <type 'str'> data len: 6144
b64 <type 'str'> data len: 8192
bin <type 'str'> data len: 2048
b64 <type 'str'> data len: 8192
bin <type 'str'> data len: 4097
b64 <type 'str'> data len: 1064
bin <type 'str'> data len: 796

$ ll
19226 Feb  5 14:24 img.jpg
25640 Mar 29 12:12 img.jpg.b64
13085 Mar 29 12:14 img.jpg.jpg

推荐答案

遇到填充问题:

>>> open('pianoavatar.jpg').read(8192).encode('base64')[-5:]
'IIE=\n'

Base64解码在遇到=填充标记时停止.您的第二读物在第10924个字符处发现了这样的标记.

Base64 decoding stops when it encounters the = padding marker. Your second read finds such a marker at the 10924th character.

您需要将块大小调整为可以被3整除,以避免在输出文件的中间填充.例如,使用块大小为8190.

You need to adjust your chunk size to be divisible by 3 instead to avoid padding in the middle of your output file. Use a chunk size of 8190, for example.

读取时,您需要使用4的倍数的缓冲区大小,以免也遇到对齐问题. 8192在那里可以很好地工作,但是您必须确保在您的函数中满足此限制.您最好将输入块默认为base64扩展块大小. 10920,编码块大小为8190(每3字节编码4个base64字符).

When reading, you need to use a buffersize that's a multiple of 4 to avoid running into alignment issues as well. 8192 would do fine there, but you must ensure this restriction is met in your functions. You'd be better off defaulting to the base64 expanded chunk size for the input chunks; 10920 for an encoding chunk size of 8190 (4 base64 characters for every 3 bytes encoded).

演示:

>>> write_base64_file_from_file('pianoavatar.jpg', 'test.b64', 8190)
bin <type 'str'> data len: 8190
b64 <type 'str'> data len: 10920
bin <type 'str'> data len: 8190
b64 <type 'str'> data len: 10920
bin <type 'str'> data len: 1976
b64 <type 'str'> data len: 2636

即使您原来的块大小为8192,现在阅读也可以正常工作

Reading now works just fine, even at your original chunk size of 8192:

>>> write_file_from_base64_file('test.b64', 'test.jpg', 8192)
b64 <type 'str'> data len: 8192
bin <type 'str'> data len: 6144
b64 <type 'str'> data len: 8192
bin <type 'str'> data len: 6144
b64 <type 'str'> data len: 8092
bin <type 'str'> data len: 6068

您可以使用简单的模数强制将缓冲区大小与函数对齐:

You can force the buffersize to be aligned in your functions with a simple modulus:

def write_base64_file_from_file(src_fname, b64_fname, chunk_size=8190):
    chunk_size -= chunk_size % 3  # align to multiples of 3
    # ...

def write_file_from_base64_file(b64_fname, dst_fname, chunk_size=10920):
    chunk_size -= chunk_size % 4  # align to multiples of 4
    # ...

这篇关于base64使用python 2.7编码,解码成块的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆