批量上传图片的巨大集合天青Blob存储 [英] Batch Uploading Huge Sets of Images to Azure Blob Storage

查看:206
本文介绍了批量上传图片的巨大集合天青Blob存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大约110,000我的硬盘上本地存储各种格式(JPG,PNG和GIF)和大小(2-40KB)图像。我需要将它们上传到Azure的Blob存储。虽然这样做,我需要设置一些元数据和BLOB的ContentType的,但除此之外,它是一个直线上升批量上传

I have about 110,000 images of various formats (jpg, png and gif) and sizes (2-40KB) stored locally on my hard drive. I need to upload them to Azure Blob Storage. While doing this, I need to set some metadata and the blob's ContentType, but otherwise it's a straight up bulk upload.

我目前使用下面的处理在同一时间(并联,5-10并发任务)上传一张图片。

I'm currently using the following to handle uploading one image at a time (paralleled over 5-10 concurrent Tasks).

static void UploadPhoto(Image pic, string filename, ImageFormat format)
{
    //convert image to bytes
    using(MemoryStream ms = new MemoryStream())
    {
        pic.Save(ms, format);
        ms.Position = 0;

        //create the blob, set metadata and properties
        var blob = container.GetBlobReference(filename);
        blob.Metadata["Filename"] = filename;
        blob.Properties.ContentType = MimeHandler.GetContentType(Path.GetExtension(filename));

        //upload!
        blob.UploadFromStream(ms);
        blob.SetMetadata();
        blob.SetProperties();
    }
}

我想知道是否有另一种技术,我可以使用来处理上传,使其尽可能地快。这个特殊的项目涉及进口了大量的数据从一个系统到另一个,客户的原因,需要尽可能快地发生

I was wondering if there was another technique I could employ to handle the uploading, to make it as fast as possible. This particular project involves importing a lot of data from one system to another, and for customer reasons it needs to happen as quickly as possible.

推荐答案

好了,这就是我所做的。我与运行BeginUploadFromStream(修修补补左右),然后BeginSetMetadata(),然后BeginSetProperties()在异步链,并联在5-10螺纹(ElvisLive的和knightpfhor的建议的组合)。这个工作,但任何超过5个线程了可怕的性能,同时向上20秒为每个线程(一次十张图片的网页上工作)来完成。

Okay, here's what I did. I tinkered around with running BeginUploadFromStream(), then BeginSetMetadata(), then BeginSetProperties() in an asynchronous chain, paralleled over 5-10 threads (a combination of ElvisLive's and knightpfhor's suggestions). This worked, but anything over 5 threads had terrible performance, taking upwards of 20 seconds for each thread (working on a page of ten images at a time) to complete.

所以,综上所述的性能差异:

So, to sum up the performance differences:


  • 异步:5线程,每个线程运行的异步链中,每一次上十张图片的工作(分页统计的原因):〜15.8秒(每线程)

  • 同步:1线程,一次十张图片(分页统计原因):〜3.4秒

  • Asynchronous: 5 threads, each running an async chain, each working on ten images at a time (paged for statistical reasons): ~15.8 seconds (per thread).
  • Synchronous: 1 thread, ten images at a time (paged for statistical reasons): ~3.4 seconds

好吧,这是pretty有趣。一个实例上传斑点同步的另一种方法进行比每个线程5倍更好。因此,即使运行5个线程网的最佳平衡异步本质上是相同的性能

Okay, that's pretty interesting. One instance uploading blobs synchronously performed 5x better than each thread in the other approach. So, even running the best async balance of 5 threads nets essentially the same performance.

所以,我调整我的形象文件导入到图像分离成含有每10,000张的文件夹。然后我用的Process.Start()来启动我的blob上传的一个实例,为每个文件夹。我有17万的图像,在此批工作,这样就意味着上传的17实例。当运行所有这些在我的笔记本电脑,横跨所有​​的人都在表现出平整的〜每套4.3秒

So, I tweaked my image file importing to separate the images into folders containing 10,000 images each. Then I used Process.Start() to launch an instance of my blob uploader for each folder. I have 170,000 images to work with in this batch, so that means 17 instances of the uploader. When running all of those on my laptop, performance across all of them leveled out at ~4.3 seconds per set.

长话短说,而不是试图让线程优化的工作,我只是在同时运行一个blob上传实例,每10,000张,所有的一台机器上。总的性能提升?

Long story short, instead of trying to get threading working optimally, I just run a blob uploader instance for every 10,000 images, all on the one machine at the same time. Total performance boost?


  • 异步尝试: 14-16小时的基础上,平均执行时间运行了一个小时或两个时

  • 同步与17不同的实例:大约1小时,5分钟

  • Async Attempts: 14-16 hours, based on average execution time when running it for an hour or two.
  • Synchronous with 17 separate instances: ~1 hour, 5 minutes.

这篇关于批量上传图片的巨大集合天青Blob存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆