使用 boto3 和回调跟踪 S3 文件的下载进度 [英] Track download progress of S3 file using boto3 and callbacks

查看:23
本文介绍了使用 boto3 和回调跟踪 S3 文件的下载进度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 boto3 从 S3 下载文本文件.

I am trying to download a text file from S3 using boto3.

这是我写的.

class ProgressPercentage(object):
    def __init__(self, filename):
        self._filename = filename
        self._size = float(os.path.getsize(filename))
        self._seen_so_far = 0
        self._lock = threading.Lock()

    def __call__(self, bytes_amount):
        # To simplify we'll assume this is hooked up
        # to a single filename.
        with self._lock:
            self._seen_so_far += bytes_amount
            percentage = round((self._seen_so_far / self._size) * 100,2)
            LoggingFile('{} is the file name. {} out of {} done. The percentage completed is {} %'.format(str(self._filename), str(self._seen_so_far), str(self._size),str(percentage)))
            sys.stdout.flush()

我用

transfer.download_file(BUCKET_NAME,FILE_NAME,'{}{}'.format(LOCAL_PATH_TEMP , FILE_NAME),callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))

这给了我一个错误,文件不存在于文件夹中.显然,当我在同一个文件夹中已经有一个同名的文件时,它可以工作,但是当我下载一个新文件时,它会出错.

this is giving me a error that file is not present in the folder. Apparently when I already have a file with this name in the same folder it works but when I am downloading a fresh file , it errors out.

我需要做哪些更正?

推荐答案

callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME)) 创建一个 ProgressPercentage 对象,运行它的 __init__ 方法,并将对象作为 callback 传递给 download_file 方法.这意味着 __init__ 方法在 download_file 开始之前运行.

callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME)) creates a ProgressPercentage object, runs its __init__ method, and passes the object as callback to the download_file method. This means the __init__ method is run before download_file begins.

__init__ 方法中,您试图读取要下载到的本地文件的大小,这会引发异常,因为该文件不存在,因为下载尚未开始.如果您已经下载了该文件,则没有问题,因为存在本地副本并且可以读取其大小.

In the __init__ method you are attempting to read the size of the local file being downloaded to, which throws an exception as the file does not exist since the download has yet to start. If you've already downloaded the file, then there's no problem since a local copy exists and its size can be read.

当然,这仅仅是您所看到的异常的原因.您正在使用 _size 属性作为下载进度的最大值.但是,您正在尝试使用本地文件的大小.在文件完全下载之前,本地文件系统不知道文件有多大,它只知道它现在占用了多少空间.这意味着当您下载文件时,文件会逐渐变大,直到达到其完整大小.因此,将本地文件的大小视为下载的最大大小实际上没有意义.如果您已经下载了文件,它可能会起作用,但这不是很有用.

Of course, this is merely the cause of the exception you're seeing. You're using the _size property as the maximum value of download progress. However you're attempting to use the size of the local file. Until the file is completely downloaded, the local file system does not know how large the file is, it only knows how much space it takes up right now. This means as you download the file will gradually get bigger until it reaches its full size. As such, it doesn't really make sense to consider the size of the local file as the maximum size of the download. It may work in the case where you've already downloaded the file, but that isn't very useful.

解决您的问题的方法是检查您要下载的文件的大小,而不是检查本地副本的大小.这可确保您获得正在下载的任何文件的实际大小,并且该文件存在(因为如果没有,您将无法下载它).您可以通过使用 head_object 获取远程文件的大小来做到这一点,如下所示

The solution to your problem would be to check the size of the file you're going to download, instead of the size of the local copy. This ensures you're getting the actual size of whatever it is you're downloading, and that the file exists (as you couldn't be downloading it if it didn't). You can do this by getting the size of the remote file with head_object as follows

class ProgressPercentage(object):
    def __init__(self, client, bucket, filename):
        # ... everything else the same
        self._size = client.head_object(Bucket=bucket, Key=filename).ContentLength

    # ...

# If you still have the client object you could pass that directly 
# instead of transfer._manager._client
progress = ProgressPercentage(transfer._manager._client, BUCKET_NAME, FILE_NAME)
transfer.download_file(..., callback=progress)

最后一点,虽然您从 Boto3 文档,它不起作用,因为它用于文件上传.在这种情况下,本地文件是来源并且保证其存在.

As a final note, although you got the code from the Boto3 documentation, it didn't work because it was intended for file uploads. In that case the local file is the source and its existence guaranteed.

这篇关于使用 boto3 和回调跟踪 S3 文件的下载进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆