如何在不使用python写入文件的情况下将文件分块传输到Azure Blob存储 [英] how to transfer file to azure blob storage in chunks without writing to file using python
问题描述
我需要将文件从Google云存储传输到Azure Blob存储.
I need to transfer files from google cloud storage to azure blob storage.
Google提供了一个代码段,用于将文件下载到byte变量,如下所示:
Google gives a code snippet to download files to byte variable like so:
# Get Payload Data
req = client.objects().get_media(
bucket=bucket_name,
object=object_name,
generation=generation) # optional
# The BytesIO object may be replaced with any io.Base instance.
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, req, chunksize=1024*1024)
done = False
while not done:
status, done = downloader.next_chunk()
if status:
print 'Download %d%%.' % int(status.progress() * 100)
print 'Download Complete!'
print fh.getvalue()
我能够通过更改fh对象类型来将其修改为存储到文件中:
I was able to modify this to store to file by changing the fh object type like so:
fh = open(object_name, 'wb')
然后我可以使用blob_service.put_block_blob_from_path
上传到azure blob存储.
Then I can upload to azure blob storage using blob_service.put_block_blob_from_path
.
我要避免在执行传输的机器上写入本地文件.
I want to avoid writing to local file on machine doing the transfer.
我收集了Google的代码片段,一次将数据加载到io.BytesIO()对象中.我认为我可能应该使用它来一次写入blob存储块.
I gather Google's snippet loads the data into the io.BytesIO() object a chunk at a time. I reckon I should probably use this to write to blob storage a chunk at a time.
我尝试将整个内容读取到内存中,然后使用put_block_blob_from_bytes
上传,但是出现内存错误(文件可能太大(〜600MB).
I experimented with reading the whole thing into memory, and then uploading using put_block_blob_from_bytes
, but I got a memory error (file is probably too big (~600MB).
有什么建议吗?
推荐答案
根据对于Google Cloud Storage的BlobReader
,您可以尝试使用Azure函数blobreader
的流,该函数具有功能read
作为流,请参见下文.
According to the source codes of blobservice.py
for Azure Storage and BlobReader
for Google Cloud Storage, you can try to use the Azure function blobservice.put_block_blob_from_file
to write the stream from the GCS class blobreader
has the function read
as stream, please see below.
因此,请参考 https://cloud.google中的代码. com/appengine/docs/python/blobstore/#Python_Using_BlobReader ,您可以尝试如下操作.
So refering to the code from https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_BlobReader, you can try to do this as below.
from google.appengine.ext import blobstore
from azure.storage.blob import BlobService
blob_key = ...
blob_reader = blobstore.BlobReader(blob_key)
blob_service = BlobService(account_name, account_key)
container_name = ...
blob_name = ...
blobservice.put_block_blob_from_file(container_name, blob_name, blob_reader)
这篇关于如何在不使用python写入文件的情况下将文件分块传输到Azure Blob存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!