从Linux上载10,000,000个文件到Azure blob存储 [英] Uploading 10,000,000 files to Azure blob storage from Linux
问题描述
我对S3有一定的经验,过去曾使用s3-parallel-put
在其中放置了许多(百万个)小文件.与Azure相比,S3的PUT价格昂贵,因此我正在考虑切换到Azure.
I have some experience with S3, and in the past have used s3-parallel-put
to put many (millions) small files there. Compared to Azure, S3 has an expensive PUT price so I'm thinking to switch to Azure.
但是我似乎无法弄清楚如何使用azure cli
将本地目录同步到远程容器.特别是,我有以下问题:
I don't however seem to be able to figure out how to sync a local directory to a remote container using azure cli
. In particular, I have the following questions:
1- aws
客户端提供了sync
选项. azure
有这样的选择吗?
1- aws
client provides a sync
option. Is there such an option for azure
?
2-我可以使用cli
同时将多个文件上传到Azure存储吗?我注意到azure storage blob upload
有一个-concurrenttaskcount
标志,因此我认为原则上必须是可能的.
2- Can I concurrently upload multiple files to Azure storage using cli
? I noticed that there is a -concurrenttaskcount
flag for azure storage blob upload
, so I assume it must be possible in principle.
推荐答案
如果您更喜欢命令行并拥有最新的Python解释器,则Azure Batch和HPC团队已发布了带有一些
If you prefer the commandline and have a recent Python interpreter, the Azure Batch and HPC team has released a code sample with some AzCopy-like functionality on Python called blobxfer. This allows full recursive directory ingress into Azure Storage as well as full container copy back out to local storage. [full disclosure: I'm a contributor for this code]
要回答您的问题:
- blobxfer使用MD5校验和比较入口和出口来支持类似rsync的操作
- blobxfer在单个文件内和多个文件内执行并发操作.但是,您可能希望将输入分散在多个目录和容器中,这不仅有助于减少脚本中的内存使用量,而且可以更好地对负载进行分区
这篇关于从Linux上载10,000,000个文件到Azure blob存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!