使用boto3和回调跟踪S3文件的下载进度 [英] Track download progress of S3 file using boto3 and callbacks

查看:186
本文介绍了使用boto3和回调跟踪S3文件的下载进度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用boto3从S3下载文本文件.

I am trying to download a text file from S3 using boto3.

这是我写的.

class ProgressPercentage(object):
    def __init__(self, filename):
        self._filename = filename
        self._size = float(os.path.getsize(filename))
        self._seen_so_far = 0
        self._lock = threading.Lock()

    def __call__(self, bytes_amount):
        # To simplify we'll assume this is hooked up
        # to a single filename.
        with self._lock:
            self._seen_so_far += bytes_amount
            percentage = round((self._seen_so_far / self._size) * 100,2)
            LoggingFile('{} is the file name. {} out of {} done. The percentage completed is {} %'.format(str(self._filename), str(self._seen_so_far), str(self._size),str(percentage)))
            sys.stdout.flush()

而我正在使用它

transfer.download_file(BUCKET_NAME,FILE_NAME,'{}{}'.format(LOCAL_PATH_TEMP , FILE_NAME),callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))

这给我一个错误,指出文件夹中不存在该文件.显然,当我已经在同一文件夹中拥有此名称的文件时,它可以工作,但是当我下载一个新文件时,它会出错.

this is giving me a error that file is not present in the folder. Apparently when I already have a file with this name in the same folder it works but when I am downloading a fresh file , it errors out.

我需要做哪些更正?

推荐答案

callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))创建一个ProgressPercentage对象,运行其__init__方法,然后将该对象作为callback传递给download_file方法.这意味着__init__方法在 download_file开始之前运行.

callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME)) creates a ProgressPercentage object, runs its __init__ method, and passes the object as callback to the download_file method. This means the __init__ method is run before download_file begins.

__init__方法中,您尝试读取要下载到的本地文件的大小,这会引发异常,因为该文件不存在,因为尚未开始下载.如果您已经下载了文件,那么没有问题,因为存在本地副本并且可以读取其大小.

In the __init__ method you are attempting to read the size of the local file being downloaded to, which throws an exception as the file does not exist since the download has yet to start. If you've already downloaded the file, then there's no problem since a local copy exists and its size can be read.

当然,这仅仅是您所看到的异常的原因.您正在使用_size属性作为下载进度的最大值.但是,您尝试使用本地文件的大小.在文件完全下载之前,本地文件系统不知道文件的大小,它仅知道当前占用的空间.这意味着在您下载文件时,文件将逐渐变大,直到达到其完整大小.因此,将本地文件的大小视为下载的最大大小并没有任何意义.在您已经下载文件的情况下,它可能会起作用,但这不是很有用.

Of course, this is merely the cause of the exception you're seeing. You're using the _size property as the maximum value of download progress. However you're attempting to use the size of the local file. Until the file is completely downloaded, the local file system does not know how large the file is, it only knows how much space it takes up right now. This means as you download the file will gradually get bigger until it reaches its full size. As such, it doesn't really make sense to consider the size of the local file as the maximum size of the download. It may work in the case where you've already downloaded the file, but that isn't very useful.

解决此问题的方法是检查要下载的文件的大小,而不是本地副本的大小.这样可以确保您获得要下载的文件的实际大小,并且该文件存在(因为如果没有下载,则无法下载).您可以通过head_object如下获取远程文件的大小来实现此目的

The solution to your problem would be to check the size of the file you're going to download, instead of the size of the local copy. This ensures you're getting the actual size of whatever it is you're downloading, and that the file exists (as you couldn't be downloading it if it didn't). You can do this by getting the size of the remote file with head_object as follows

class ProgressPercentage(object):
    def __init__(self, client, bucket, filename):
        # ... everything else the same
        self._size = client.head_object(Bucket=bucket, Key=filename).ContentLength

    # ...

# If you still have the client object you could pass that directly 
# instead of transfer._manager._client
progress = ProgressPercentage(transfer._manager._client, BUCKET_NAME, FILE_NAME)
transfer.download_file(..., callback=progress)

最后,尽管您从 Boto3文档,它不起作用,因为它是用于文件上传的.在这种情况下,本地文件是源文件,并且可以保证其存在.

As a final note, although you got the code from the Boto3 documentation, it didn't work because it was intended for file uploads. In that case the local file is the source and its existence guaranteed.

这篇关于使用boto3和回调跟踪S3文件的下载进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆