打开Azure StorageStreamDownloader而不将其另存为文件 [英] Open an Azure StorageStreamDownloader without saving it as a file

查看:76
本文介绍了打开Azure StorageStreamDownloader而不将其另存为文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从azure的blob容器中下载PDF作为下载流(StorageStreamDownloader),并在PDFPlumber和PDFminer中将其打开. 我开发了将它们加载为文件的所有要求,但是我无法设法接收到下载流(StorageStreamDownloader)并成功打开它. 我是这样打开PDF的:

I need to download a PDF from a blob container in azure as a download stream (StorageStreamDownloader) and open it in both PDFPlumber and PDFminer. I developed all the requirements loading them as a file, but I cant manage to received a download stream (StorageStreamDownloader) and open it successfully. I was opening the PDFs like this:

pdf = pdfplumber.open(pdfpath) //for pdfplumber
fp = open('Pdf/' + fileGlob, 'rb')  // for pdfminer
parser = PDFParser(fp) 
document = PDFDocument(parser)

但是,我需要能够下载流.将pdf下载为文件的代码段:

However, i need to be able to download a stream. Code snippet that downloads the pdf as a file:

blob_client = container.get_blob_client(remote_file)
with open(local_file_path,"wb") as local_file:
    download_stream = blob_client.download_blob()
    local_file.write(download_stream.readall())
    local_file.close()

我尝试了几种选择,即使使用没有运气的临时文件也是如此. 有什么想法吗?

I tried several options, even using a temp file with no luck. Any ideas?

推荐答案

download_blob()将blob下载到StorageStreamDownloader类,并且在该类中有一个download_to_stream,由此您将获得blob流.

download_blob() download the blob to a StorageStreamDownloader class, and in this class there is a download_to_stream, with this you will get the blob stream.

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from io import BytesIO
import PyPDF2
filename = "test.pdf"

container_name="test"

blob_service_client = BlobServiceClient.from_connection_string("connection string")
container_client=blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader=blob_client.download_blob()

stream = BytesIO()
streamdownloader.download_to_stream(stream)

fileReader = PyPDF2.PdfFileReader(stream)

print(fileReader.numPages)

这是我的结果.它将打印pdf页号.

And this is my result. It will print the pdf pages number.

这篇关于打开Azure StorageStreamDownloader而不将其另存为文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆