使用boto3和回调跟踪S3文件的下载进度 [英] Track download progress of S3 file using boto3 and callbacks
问题描述
我正在尝试使用boto3从S3下载文本文件.
I am trying to download a text file from S3 using boto3.
这是我写的.
class ProgressPercentage(object):
def __init__(self, filename):
self._filename = filename
self._size = float(os.path.getsize(filename))
self._seen_so_far = 0
self._lock = threading.Lock()
def __call__(self, bytes_amount):
# To simplify we'll assume this is hooked up
# to a single filename.
with self._lock:
self._seen_so_far += bytes_amount
percentage = round((self._seen_so_far / self._size) * 100,2)
LoggingFile('{} is the file name. {} out of {} done. The percentage completed is {} %'.format(str(self._filename), str(self._seen_so_far), str(self._size),str(percentage)))
sys.stdout.flush()
而我正在使用它
transfer.download_file(BUCKET_NAME,FILE_NAME,'{}{}'.format(LOCAL_PATH_TEMP , FILE_NAME),callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))
这给我一个错误,指出文件夹中不存在该文件.显然,当我已经在同一文件夹中拥有此名称的文件时,它可以工作,但是当我下载一个新文件时,它会出错.
this is giving me a error that file is not present in the folder. Apparently when I already have a file with this name in the same folder it works but when I am downloading a fresh file , it errors out.
我需要做哪些更正?
推荐答案
callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))
创建一个ProgressPercentage
对象,运行其__init__
方法,然后将该对象作为callback
传递给download_file
方法.这意味着__init__
方法在 download_file
开始之前运行.
callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))
creates a ProgressPercentage
object, runs its __init__
method, and passes the object as callback
to the download_file
method. This means the __init__
method is run before download_file
begins.
在__init__
方法中,您尝试读取要下载到的本地文件的大小,这会引发异常,因为该文件不存在,因为尚未开始下载.如果您已经下载了文件,那么没有问题,因为存在本地副本并且可以读取其大小.
In the __init__
method you are attempting to read the size of the local file being downloaded to, which throws an exception as the file does not exist since the download has yet to start. If you've already downloaded the file, then there's no problem since a local copy exists and its size can be read.
当然,这仅仅是您所看到的异常的原因.您正在使用_size
属性作为下载进度的最大值.但是,您尝试使用本地文件的大小.在文件完全下载之前,本地文件系统不知道文件的大小,它仅知道当前占用的空间.这意味着在您下载文件时,文件将逐渐变大,直到达到其完整大小.因此,将本地文件的大小视为下载的最大大小并没有任何意义.在您已经下载文件的情况下,它可能会起作用,但这不是很有用.
Of course, this is merely the cause of the exception you're seeing. You're using the _size
property as the maximum value of download progress. However you're attempting to use the size of the local file. Until the file is completely downloaded, the local file system does not know how large the file is, it only knows how much space it takes up right now. This means as you download the file will gradually get bigger until it reaches its full size. As such, it doesn't really make sense to consider the size of the local file as the maximum size of the download. It may work in the case where you've already downloaded the file, but that isn't very useful.
解决此问题的方法是检查要下载的文件的大小,而不是本地副本的大小.这样可以确保您获得要下载的文件的实际大小,并且该文件存在(因为如果没有下载,则无法下载).您可以通过head_object
如下获取远程文件的大小来实现此目的
The solution to your problem would be to check the size of the file you're going to download, instead of the size of the local copy. This ensures you're getting the actual size of whatever it is you're downloading, and that the file exists (as you couldn't be downloading it if it didn't). You can do this by getting the size of the remote file with head_object
as follows
class ProgressPercentage(object):
def __init__(self, client, bucket, filename):
# ... everything else the same
self._size = client.head_object(Bucket=bucket, Key=filename).ContentLength
# ...
# If you still have the client object you could pass that directly
# instead of transfer._manager._client
progress = ProgressPercentage(transfer._manager._client, BUCKET_NAME, FILE_NAME)
transfer.download_file(..., callback=progress)
最后,尽管您从 Boto3文档,它不起作用,因为它是用于文件上传的.在这种情况下,本地文件是源文件,并且可以保证其存在.
As a final note, although you got the code from the Boto3 documentation, it didn't work because it was intended for file uploads. In that case the local file is the source and its existence guaranteed.
这篇关于使用boto3和回调跟踪S3文件的下载进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!