使用 boto3 和回调跟踪 S3 文件的下载进度 [英] Track download progress of S3 file using boto3 and callbacks
问题描述
我正在尝试使用 boto3 从 S3 下载文本文件.
I am trying to download a text file from S3 using boto3.
这是我写的.
class ProgressPercentage(object):
def __init__(self, filename):
self._filename = filename
self._size = float(os.path.getsize(filename))
self._seen_so_far = 0
self._lock = threading.Lock()
def __call__(self, bytes_amount):
# To simplify we'll assume this is hooked up
# to a single filename.
with self._lock:
self._seen_so_far += bytes_amount
percentage = round((self._seen_so_far / self._size) * 100,2)
LoggingFile('{} is the file name. {} out of {} done. The percentage completed is {} %'.format(str(self._filename), str(self._seen_so_far), str(self._size),str(percentage)))
sys.stdout.flush()
我用
transfer.download_file(BUCKET_NAME,FILE_NAME,'{}{}'.format(LOCAL_PATH_TEMP , FILE_NAME),callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))
这给了我一个错误,文件不存在于文件夹中.显然,当我在同一个文件夹中已经有一个同名的文件时,它可以工作,但是当我下载一个新文件时,它会出错.
this is giving me a error that file is not present in the folder. Apparently when I already have a file with this name in the same folder it works but when I am downloading a fresh file , it errors out.
我需要做哪些更正?
推荐答案
callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))
创建一个 ProgressPercentage
对象,运行它的 __init__
方法,并将对象作为 callback
传递给 download_file
方法.这意味着 __init__
方法在 download_file
开始之前运行.
callback = ProgressPercentage(LOCAL_PATH_TEMP + FILE_NAME))
creates a ProgressPercentage
object, runs its __init__
method, and passes the object as callback
to the download_file
method. This means the __init__
method is run before download_file
begins.
在 __init__
方法中,您试图读取要下载到的本地文件的大小,这会引发异常,因为该文件不存在,因为下载尚未开始.如果您已经下载了该文件,则没有问题,因为存在本地副本并且可以读取其大小.
In the __init__
method you are attempting to read the size of the local file being downloaded to, which throws an exception as the file does not exist since the download has yet to start. If you've already downloaded the file, then there's no problem since a local copy exists and its size can be read.
当然,这仅仅是您所看到的异常的原因.您正在使用 _size
属性作为下载进度的最大值.但是,您正在尝试使用本地文件的大小.在文件完全下载之前,本地文件系统不知道文件有多大,它只知道它现在占用了多少空间.这意味着当您下载文件时,文件会逐渐变大,直到达到其完整大小.因此,将本地文件的大小视为下载的最大大小实际上没有意义.如果您已经下载了文件,它可能会起作用,但这不是很有用.
Of course, this is merely the cause of the exception you're seeing. You're using the _size
property as the maximum value of download progress. However you're attempting to use the size of the local file. Until the file is completely downloaded, the local file system does not know how large the file is, it only knows how much space it takes up right now. This means as you download the file will gradually get bigger until it reaches its full size. As such, it doesn't really make sense to consider the size of the local file as the maximum size of the download. It may work in the case where you've already downloaded the file, but that isn't very useful.
解决您的问题的方法是检查您要下载的文件的大小,而不是检查本地副本的大小.这可确保您获得正在下载的任何文件的实际大小,并且该文件存在(因为如果没有,您将无法下载它).您可以通过使用 head_object
获取远程文件的大小来做到这一点,如下所示
The solution to your problem would be to check the size of the file you're going to download, instead of the size of the local copy. This ensures you're getting the actual size of whatever it is you're downloading, and that the file exists (as you couldn't be downloading it if it didn't). You can do this by getting the size of the remote file with head_object
as follows
class ProgressPercentage(object):
def __init__(self, client, bucket, filename):
# ... everything else the same
self._size = client.head_object(Bucket=bucket, Key=filename).ContentLength
# ...
# If you still have the client object you could pass that directly
# instead of transfer._manager._client
progress = ProgressPercentage(transfer._manager._client, BUCKET_NAME, FILE_NAME)
transfer.download_file(..., callback=progress)
最后一点,虽然您从 Boto3 文档,它不起作用,因为它用于文件上传.在这种情况下,本地文件是来源并且保证其存在.
As a final note, although you got the code from the Boto3 documentation, it didn't work because it was intended for file uploads. In that case the local file is the source and its existence guaranteed.
这篇关于使用 boto3 和回调跟踪 S3 文件的下载进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!