从下载冰川自动气象站使用博托大型档案库 [英] Downloading a large archive from AWS Glacier using Boto

查看:189
本文介绍了从下载冰川自动气象站使用博托大型档案库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Python包,宝途从冰川下载大型档案(约1 TB)。我使用看起来像这样的当前方法:

I am trying to download a large archive (~ 1 TB) from Glacier using the Python package, Boto. The current method that I am using looks like this:

import os
import boto.glacier
import boto
import time

ACCESS_KEY_ID = 'XXXXX'
SECRET_ACCESS_KEY = 'XXXXX'
VAULT_NAME = 'XXXXX'
ARCHIVE_ID = 'XXXXX'
OUTPUT = 'XXXXX'

layer2 = boto.connect_glacier(aws_access_key_id = ACCESS_KEY_ID,
                              aws_secret_access_key = SECRET_ACCESS_KEY)

gv = layer2.get_vault(VAULT_NAME)

job = gv.retrieve_archive(ARCHIVE_ID)
job_id = job.id

while not job.completed:
    time.sleep(10)
    job = gv.get_job(job_id)

if job.completed:
    print "Downloading archive"
    job.download_to_file(OUTPUT)

的问题是,24小时,这是不足够的时间来检索整个归档后的作业ID到期。我将需要打破下载成至少4件。我怎样才能做到这一点,将输出写入到一个文件中?

The problem is that the job ID expires after 24 hours, which is not enough time to retrieve the entire archive. I will need to break the download into at least 4 pieces. How can I do this and write the output to a single file?

推荐答案

好像叫的工作时,你可以简单地指定 CHUNK_SIZE 参数。 download_to_file 像这样:

It seems that you can simply specify the chunk_size parameter when calling job.download_to_file like so :

if job.completed:
    print "Downloading archive"
    job.download_to_file(OUTPUT, chunk_size=1024*1024)

不过,如果你不能在24小时内下载所有的块,我不认为你可以选择只下载你错过了使用2层的人。

However, if you can't download the all the chunks during the 24 hours I don't think you can choose to download only the one you missed using layer2.

使用层1,你可以简单的使用方法<一href="http://docs.pythonboto.org/en/latest/ref/glacier.html#boto.glacier.layer1.Layer1.get_job_output"相对=nofollow> get_job_output ,并指定要下载的字节范围。

Using layer1 you can simply use the method get_job_output and specify the byte-range you want to download.

它看起来像:

file_size = check_file_size(OUTPUT)

if job.completed:
    print "Downloading archive"
    with open(OUTPUT, 'wb') as output_file:
        i = 0
        while True:
            response = gv.get_job_output(VAULT_NAME, job_id, (file_size + 1024 * 1024 * i, file_size + 1024 * 1024 * (i + 1)))
            output_file.write(response)
            if len(response) < 1024 * 1024:
                break
            i += 1

通过这个脚本,你应该能够失败时重新运行该脚本并继续下载你的存档,离开它。

With this script you should be able to rerun the script when it fails and continue to download your archive where you left it.

通过挖掘在博托code,我发现在作业班私人的方法,你也可以使用:的 _ download_byte_range 。使用这种方法,你仍然可以使用2层。

By digging in the boto code I found a "private" method in the Job class that you might also use : _download_byte_range. With this method you can still use layer2.

file_size = check_file_size(OUTPUT)

if job.completed:
    print "Downloading archive"
    with open(OUTPUT, 'wb') as output_file:
        i = 0
        while True:
            response = job._download_byte_range(file_size + 1024 * 1024 * i, file_size + 1024 * 1024 * (i + 1)))
            output_file.write(response)
            if len(response) < 1024 * 1024:
                break
            i += 1

这篇关于从下载冰川自动气象站使用博托大型档案库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆