从远程URL将Azure Blob复制为BlockBlob [英] Copy Azure Blob as BlockBlob from Remote URL

查看:85
本文介绍了从远程URL将Azure Blob复制为BlockBlob的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 azure-sdk-for-python BlobClient start_copy_from_url 将远程文件复制到我的本地存储中.

I'm using the azure-sdk-for-python BlobClient start_copy_from_url to copy a remote file to my local storage.

但是,文件始终以AppendBlob(而不是BlockBlob)结尾.我看不到如何强制目标BlockType为BlockBlob.

However, the file always ends up as an AppendBlob instead of BlockBlob. I can't see how I can force the destination BlockType to be BlockBlob.

connection_string = "connection string to my dest blob storage account"
container_name = "myContainerName"
dest_file_name = "myDestFile.csv"
remote_blob_url = "http://path/to/remote/blobfile.csv"

client = BlobServiceClient.from_connection_string(connection_string)
dest_blob = client.get_blob_client(container_name, dest_file_name)
dest_blob.start_copy_from_url(remote_blob_url)

推荐答案

这是您要使用最新版本(v12)执行的操作根据

Here is what you want to do using the latest version (v12) According to the documentation,

复制操作的源Blob可能是块Blob,附加Blob,或页面Blob.如果目标Blob已经存在,则它必须是与源Blob具有相同的Blob类型.

The source blob for a copy operation may be a block blob, an append blob, or a page blob. If the destination blob already exists, it must be of the same blob type as the source blob.

现在,您不能使用start_copy_from_url来指定Blob类型.但是,在某些情况下,您可以使用同步复制APIS来做到这一点.

Right now, you cannot use start_copy_from_url to specify a blob type. However, you can use the synchronous copy APIS to do so in some cases.

例如,对于块到页面的blob,首先创建目标页面的blob,然后在目标上调用 update_range_from_url ,源中每个块的大小为4 MB.

For example, for block to page blob, create the destination page blob first and invoke update_range_from_url on the destination, with each chunk of 4 MB from the source.

类似地,在您的情况下,首先创建一个空的块blob,然后使用

Similarly, in your case, create an empty block blob first and the use the stage_block_from_url method.

from azure.storage.blob import ContainerClient
import os

conn_str = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
dest_blob_name = "mynewblob"
source_url = "http://www.gutenberg.org/files/59466/59466-0.txt"

container_client = ContainerClient.from_connection_string(conn_str, "testcontainer")

blob_client = container_client.get_blob_client(dest_blob_name)
# upload the empty block blob
blob_client.upload_blob(b'')

# this will only stage your block
blob_client.stage_block_from_url(block_id=1, source_url=source_url)
# now it is committed
blob_client.commit_block_list(['1'])

# if you want to verify it's committed now
committed, uncommitted = blob_client.get_block_list('all')
assert len(committed) == 1

让我知道这是否行不通.

Let me know if this doesn't work.

您可以利用 source_offset source_length 参数来分块上传块.例如

You can leverage the source_offset and source_length params to upload blocks in chunks. For example,

stage_block_from_url(block_id,source_url,source_offset = 0,source_length = 10)

将上传前10个字节,即0到9之间的字节.因此,您可以使用计数器来不断增加block_id并跟踪偏移量和长度,直到耗尽所有块为止.

will upload the first 10 bytes i.e. bytes from 0 to 9. So, you can use a counter to keep incrementing the block_id and track your offset and length till you exhaust all your chunks.

for step in range(....):
    ###
    blob.stage_block_from_url(...)
    ##do not commit it##
#outside the for loop
blob.commit_block_list([j for j in range(i+1)]) (#or i+2?)

这篇关于从远程URL将Azure Blob复制为BlockBlob的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆