最快的方法移动数以万计的小文件到Azure存储容器 [英] The fastest method to move tens of thousands of small files to Azure Storage container

查看:506
本文介绍了最快的方法移动数以万计的小文件到Azure存储容器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是移动数以万计的小型图像文件从我的本地机器天青云存储?

中的容器的最快方法

我想为Azure中强烈推荐莓资源管理器,估计完成时间大致的4小时的,我现在(约总〜30K的文件,5KB平均文件大小) 。这是unaccetable我 - 我想大幅度裁减那个时候。

您能否提供任何其他的选择吗?我认为,非GUI的人会更快。我将提供一个基于Linux的解决方案,我想,这并没有为我工作的(下图)的例子。也许专家可以指出类似的东西,但有一个正确的使用示例。下面的解决方案是不是当谈到排他性示例特别详细记录。在此先感谢,并随时问我更多的信息,如果你需要它。


基于Linux的解决方案,我试过被称为 blobxfer - 这就像AzCopy,但为Linux。我使用的命令是 blobxfer mystorageaccount图片/家/ myuser的/ S3 --upload --storageaccountkey =<从主要的访问键portal.azure.com> --no容器。不过,我不断收到一个神秘的错误:未知错误(HTTP头中的一个值不正确的格式。)

全部追踪:

 <?XML版本=1.0编码=UTF-8><错误>< code基InvalidHeaderValue< / code> <消息>对于HTTP标头中的一个值不正确的格式。
请求ID:61a1486c-0101-00d6-13b5-408578134000
    Time:2015-12-27T12:56:03.5390180Z</Message><HeaderName>x-ms-blob-content-length</HeaderName><HeaderValue>0</HeaderValue></Error>
螺纹加工-49的异常(除preTER关机过程中最有可能提出的):回溯(最近通话最后一个):
  文件/usr/lib/python2.7/threading.py,线路810,在__bootstrap_inner
  文件/home/myuser/.virtualenvs/redditpk/local/lib/python2.7/site-packages/blobxfer.py,506线,在运行
  文件/home/myuser/.virtualenvs/redditpk/local/lib/python2.7/site-packages/blobxfer.py,线路597,在putblobdata
  文件/home/myuser/.virtualenvs/redditpk/local/lib/python2.7/site-packages/blobxfer.py,线路652,在azure_request
&LT;键入'exceptions.AttributeError'计算值:'NoneType'对象有没有属性超时


解决方案

请尽你blobxfer升级到0.9.9.6。有与最近被固定零字节文件的一些错误。

关于与blobxfer你的问题,你应该GitHub的页面上,而不是直接计算器公开的问题。在code的维护者将有一个更容易的时间看你的问题,回复和/或有关特定工具修复您的问题。如果你还在升级到0.9.9.6后遇到与blobxfer问题,然后直接GitHub的项目页面上张贴的问题。

在一般情况下,作为shellter指出,对于成千上万的小文件,你应该首先将它们归档然后上传档案,实现更大的吞吐量。

What's the fastest way to move tens of thousands of small image files from my local machine to a container within Azure Cloud Storage?

I am trying the highly-recommended CloudBerry explorer for Azure, and the estimated-time of completion is roughly 4 hours for me right now (around ~30K files in total, 5KB average file size). This is unaccetable for me - I want to drastically cut down that time.

Can you suggest any other options? I think non-GUI ones will be faster. I'll provide an example (below) of one Linux-based solution I tried, which didn't work for me. Perhaps an expert can point out something similar, but with a correct usage example. The solution below isn't particularly well-documented when it comes to exhaustive examples. Thanks in advance, and feel free to ask me for more information in case you need it.


The Linux based solution I tried is called blobxfer - which is like AzCopy, but for Linux. The command I used was blobxfer mystorageaccount pictures /home/myuser/s3 --upload --storageaccountkey=<primary access key from portal.azure.com> --no-container. But I keep getting an arcane error: Unknown error (The value for one of the HTTP headers is not in the correct format.)

Full traceback:

<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidHeaderValue</Code><Message>The value for one of the HTTP headers is not in the correct format.
RequestId:61a1486c-0101-00d6-13b5-408578134000
    Time:2015-12-27T12:56:03.5390180Z</Message><HeaderName>x-ms-blob-content-length</HeaderName><HeaderValue>0</HeaderValue></Error>
Exception in thread Thread-49 (most likely raised during interpreter shutdown):

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
  File "/home/myuser/.virtualenvs/redditpk/local/lib/python2.7/site-packages/blobxfer.py", line 506, in run
  File "/home/myuser/.virtualenvs/redditpk/local/lib/python2.7/site-packages/blobxfer.py", line 597, in putblobdata
  File "/home/myuser/.virtualenvs/redditpk/local/lib/python2.7/site-packages/blobxfer.py", line 652, in azure_request
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'Timeout'

解决方案

Please try upgrading your blobxfer to 0.9.9.6. There were a few bugs with zero-byte files that were recently fixed.

Regarding your question with blobxfer, you should directly open issues on the GitHub page rather than on stackoverflow. Maintainers of the code will have an easier time looking at your issue and replying and/or fixing your issue with regard to that specific tool. If you are still encountering issues with blobxfer after upgrading to 0.9.9.6 then post an issue directly on the GitHub project page.

In general, as shellter has pointed out, for thousands of small files you should archive them first then upload the archive to achieve greater throughput.

这篇关于最快的方法移动数以万计的小文件到Azure存储容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆