从Linux上载10,000,000个文件到Azure blob存储 [英] Uploading 10,000,000 files to Azure blob storage from Linux

查看:75
本文介绍了从Linux上载10,000,000个文件到Azure blob存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对S3有一定的经验,过去曾使用s3-parallel-put在其中放置了许多(百万个)小文件.与Azure相比,S3的PUT价格昂贵,因此我正在考虑切换到Azure.

I have some experience with S3, and in the past have used s3-parallel-put to put many (millions) small files there. Compared to Azure, S3 has an expensive PUT price so I'm thinking to switch to Azure.

但是我似乎无法弄清楚如何使用azure cli将本地目录同步到远程容器.特别是,我有以下问题:

I don't however seem to be able to figure out how to sync a local directory to a remote container using azure cli. In particular, I have the following questions:

1- aws客户端提供了sync选项. azure有这样的选择吗?

1- aws client provides a sync option. Is there such an option for azure?

2-我可以使用cli同时将多个文件上传到Azure存储吗?我注意到azure storage blob upload有一个-concurrenttaskcount标志,因此我认为原则上必须是可能的.

2- Can I concurrently upload multiple files to Azure storage using cli? I noticed that there is a -concurrenttaskcount flag for azure storage blob upload, so I assume it must be possible in principle.

推荐答案

如果您更喜欢命令行并拥有最新的Python解释器,则Azure Batch和HPC团队已发布了带有一些

If you prefer the commandline and have a recent Python interpreter, the Azure Batch and HPC team has released a code sample with some AzCopy-like functionality on Python called blobxfer. This allows full recursive directory ingress into Azure Storage as well as full container copy back out to local storage. [full disclosure: I'm a contributor for this code]

要回答您的问题:

  1. blobxfer使用MD5校验和比较入口和出口来支持类似rsync的操作
  2. blobxfer在单个文件内和多个文件内执行并发操作.但是,您可能希望将输入分散在多个目录和容器中,这不仅有助于减少脚本中的内存使用量,而且可以更好地对负载进行分区

这篇关于从Linux上载10,000,000个文件到Azure blob存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆