如何在Python中将base64文件解码为二进制文件? [英] How to decode base64 file into binary in Python?

查看:2180
本文介绍了如何在Python中将base64文件解码为二进制文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个处理pdf文件数据的系统(我使用 PyPDF2 lib ) 。我现在获得base64编码的PDF,我可以使用以下内容正确解码和存储:

I'm building a system which handles pdf file data (for which I use the PyPDF2 lib). I now obtain a base64 encoded PDF which I can decode and store correctly using the following:

import base64
# base64FileData  <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'w') as theFile:
    theFile.write(fileData)

我现在想要使用这个 fileData 作为二进制文件将其拆分,但当我执行 type(fileData)时, fileData 原来是< type'str'> 。如何将此 fileData 转换为二进制(或至少不是字符串)?

I now want to use this fileData as a binary file to split it up, but when I do type(fileData), the fileData turns out to be a <type 'str'>. How can I convert this fileData to be a binary (or at least not a string)?

所有提示均为欢迎!

如果我做 open(fileData,'rb ')我收到错误,说

if I do open(fileData, 'rb') I get an error, saying


TypeError:file()参数1必须是没有NULL的编码字符串bytes,not str

TypeError: file() argument 1 must be encoded string without NULL bytes, not str

要删除我尝试的空字节, fileData.rstrip('\ tt \\ r \\ n \ n \ n') fileData.rstrip('\0') fileData。分区(b'\ 0')[0] ,但似乎没有任何效果。有什么想法吗?

To remove the null bytes I tried, fileData.rstrip(' \t\r\n\0') and fileData.rstrip('\0') and fileData.partition(b'\0')[0], but nothing seems to work. Any ideas?

问题是我把这个字符串传递给 PyPDF2 PdfFileReader类,其中第909行至第912行执行以下操作(其中 stream 是我提供的 fileData

The thing is that I pass this string to the PyPDF2 PdfFileReader class, which on lines 909 to 912 does the following (in which stream is the fileData I provide):

if type(stream) in (string_type, str):
    fileobj = open(stream, 'rb')
    stream = BytesIO(b_(fileobj.read()))
    fileobj.close()

因为它是一个字符串,它假设它是一个文件名,之后它会尝试打开文件。然后,这会因 TypeError 而失败。所以在将 fileData 提供给PdfFileReader之前,我需要以某种方式将其转换为 str 以外的其他内容,以便它不会尝试打开它,但只考虑 fileData 一个文件本身。任何想法?

So because its a string, it assumes it is a filename, after which it tries to open the file. This then fails with a TypeError. So before feeding the fileData to the PdfFileReader I need to somehow convert it to something else than str so that it doesn't try to open it, but just considers fileData a file on itself. Any ideas?

推荐答案

因此open的二进制模式你必须使用'wb',否则它基本上会被保存为text。

Hence the open's binary mode you have to use 'wb' else it gets saved as "text" basically.

import base64
# base64FileData  <= the base64 file data
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
with open('thefilename.pdf', 'wb') as theFile:
    theFile.write(fileData)

这篇关于如何在Python中将base64文件解码为二进制文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆