使用DataLakeFileClient和进度条下载文件 [英] Download file with DataLakeFileClient and progress bar

查看:112
本文介绍了使用DataLakeFileClient和进度条下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用DataLakeFileClient从Azure下载一个大文件,并在下载过程中显示一个进度条,例如tqdm.下面是我尝试使用的较小测试文件的代码.

I need to download a large file from Azure with DataLakeFileClient and show a progress bar like tqdm during the download. Below is the code that I was trying with a smaller test file.

# Download a File
test_file = DataLakeFileClient.from_connection_string(my_conn_str, file_system_name=fs_name, file_path="161263.tmp")

download = test_file.download_file()
blocks = download.chunks()
print(f"File Size = {download.size}, Number of blocks = {len(blocks)}")

with open("./newfile.tmp", "wb") as my_file:
    for block in tqdm(blocks):
        my_file.write(block)

结果在jupyter笔记本中显示如下,其块数与文件大小相同.

Results show like below in jupyter notebook, with number of blocks the same as file size.

如何正确设置块数和进度条正常工作?

How can I make the number of blocks correct and the progress bar work?

推荐答案

使用卡盘时,应注意只有文件大小大于 32MB ( 33554432字节),然后将文件大小(这里的文件大小表示总文件大小-32MB )分成每个块,每个块的大小为 4MB .

When using chucks, you should note that only the file size is larger than 32MB(33554432 bytes), then the file size(here, the file size means that total file size - 32MB) will be split into blocks with 4MB size for each block.

例如,如果文件大小为39MB,它将被分成3个块.第一个块是32MB,第二个块是4MB,第三个块是3MB( 39MB-32MB-4MB ).

For example, if the file size is 39MB, it will be split into 3 blocks. The first block is 32MB, the 2nd block is 4MB, the 3rd block is 3MB(39MB - 32MB - 4MB).

这里是一个例子,它可以很好地在我这边工作:

Here is an example, it can work well at my side:

from tqdm import tqdm
from azure.storage.filedatalake import DataLakeFileClient
import math

conn_str = "xxxxxxxx"
file_system_name="xxxx"
file_name="ccc.txt"

test_file = DataLakeFileClient.from_connection_string(conn_str,file_system_name,file_name)

download = test_file.download_file()

blocks = download.chunks()

number_of_blocks = 0

#if the file size is larger than 32MB
if len(blocks) > 33554432:
    number_of_blocks = math.ceil((len(blocks) - 33554432) / 1024 / 1024 / 4) + 1
else:
    number_of_blocks = 1
    
print(f"File Size = {download.size}, Number of blocks = {number_of_blocks}")

#initialize a tqdm instance
progress_bar = tqdm(total=download.size,unit='iB',unit_scale=True)

with open("D:\\a11\\ccc.txt","wb") as my_file:
    for block in blocks:
        #update the progress bar
        progress_bar.update(len(block))

        my_file.write(block)

progress_bar.close()

print("**completed**")

这篇关于使用DataLakeFileClient和进度条下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆