如何提高通过流下载大型天蓝色 blob 文件的性能? [英] How to improve performance of downloading large size azure blob file over a stream?

查看:24
本文介绍了如何提高通过流下载大型天蓝色 blob 文件的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大小约为 212 MB 的 JSON blob 文件.
在本地调试时,下载大约需要 15 分钟.
当我将代码部署到 Azure 应用服务时,它运行了 10 分钟并失败并出现错误:(在本地它间歇性地失败并出现相同的错误)

<块引用>

服务器无法验证请求.确保值正确形成授权标头,包括签名

代码尝试 1:

//创建 SAS Token 用于引用文件,持续时间为 5 分钟SharedAccessBlobPolicy sasConstraints = 新的 SharedAccessBlobPolicy{SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(15),权限 = SharedAccessBlobPermissions.Read};var blob = cloudBlobContainer.GetBlockBlobReference(blobFilePath);字符串 sasContainerToken = blob.GetSharedAccessSignature(sasConstraints);var cloudBlockBlob = new CloudBlockBlob(new Uri(blob.Uri + sasContainerToken));使用 (var stream = new MemoryStream()){等待 cloudBlockBlob.DownloadToStreamAsync(stream);//将流的位置重置为0流.位置= 0;var serializer = new JsonSerializer();使用 (var sr = new StreamReader(stream)){使用 (var jsonTextReader = new JsonTextReader(sr)){jsonTextReader.SupportMultipleContent = true;结果 = 新列表();而 (jsonTextReader.Read()){result.Add(serializer.Deserialize(jsonTextReader));}}}}

代码尝试 2:我尝试使用 DownloadRangeToStreamAsync 下载块中的 blob,但没有任何改变:

int bufferLength = 1 * 1024 * 1024;//1 MB 块long blobRemainingLength = blob.Properties.Length;队列<KeyValuePair<长,长>>queues = new Queue>();长偏移= 0;做{long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);偏移量 += 块长度;blobRemainingLength -= chunkLength;使用 (var ms = new MemoryStream()){等待 blob.DownloadRangeToStreamAsync(ms, offset, chunkLength);ms.Position = 0;锁定(输出流){outPutStream.Position = 偏移量;var 字节 = ms.ToArray();outPutStream.Write(bytes, 0, bytes.Length);}}}而 (blobRemainingLength > 0);

我认为 212 MB 的数据不是一个大的 JSON 文件.你能建议一个解决方案?

解决方案

我建议你可以尝试使用

I have JSON blob file of size around 212 MB.
On Local while debugging it is taking around 15 minutes to download.
When i deploy code to Azure app service it runs for 10 minutes and fails with error : (locally it fails intermittently with same error)

Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature

Code Attempt 1:

// Create SAS Token for referencing a file for a duration of 5 min
SharedAccessBlobPolicy sasConstraints = new SharedAccessBlobPolicy
{
    SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(15),
    Permissions = SharedAccessBlobPermissions.Read
};

var blob = cloudBlobContainer.GetBlockBlobReference(blobFilePath);
string sasContainerToken = blob.GetSharedAccessSignature(sasConstraints);

var cloudBlockBlob = new CloudBlockBlob(new Uri(blob.Uri + sasContainerToken));

using (var stream = new MemoryStream())
{
     await cloudBlockBlob.DownloadToStreamAsync(stream);
    //resetting stream's position to 0

    stream.Position = 0;
    var serializer = new JsonSerializer();

    using (var sr = new StreamReader(stream))
    {
        using (var jsonTextReader = new JsonTextReader(sr))
        {
            jsonTextReader.SupportMultipleContent = true;
            result = new List<T>();
            while (jsonTextReader.Read())
            {
                result.Add(serializer.Deserialize<T>(jsonTextReader));
            }
        }
    }
}

Code Attempt 2: I have tried using DownloadRangeToStreamAsync for downloading a blob in chunk but nothing changed :

int bufferLength = 1 * 1024 * 1024;//1 MB chunk
long blobRemainingLength = blob.Properties.Length;
Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
long offset = 0;
do
{
    long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);

    offset += chunkLength;
    blobRemainingLength -= chunkLength;
    using (var ms = new MemoryStream())
    {
        await blob.DownloadRangeToStreamAsync(ms, offset, chunkLength);
        ms.Position = 0;
        lock (outPutStream)
        {
            outPutStream.Position = offset;
            var bytes = ms.ToArray();
            outPutStream.Write(bytes, 0, bytes.Length);
        }
    }
}
while (blobRemainingLength > 0);

I think 212 MB data is not a large JSON file. Can you please suggest a solution ?

解决方案

I suggest you can give it a try by using Azure Storage Data Movement Library.

I tested with a larger file of 220MB size, it takes about 5 minutes to download it into memory.

The sample code:

        SharedAccessBlobPolicy sasConstraints = new SharedAccessBlobPolicy
        {
            SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(15),
            Permissions = SharedAccessBlobPermissions.Read
        };

        CloudBlockBlob blob = blobContainer.GetBlockBlobReference("t100.txt");
        string sasContainerToken = blob.GetSharedAccessSignature(sasConstraints);
        var cloudBlockBlob = new CloudBlockBlob(new Uri(blob.Uri + sasContainerToken));

        var stream = new MemoryStream();

        //set this value as per your need
        TransferManager.Configurations.ParallelOperations = 5;

        Console.WriteLine("begin to download...");

        //use Stopwatch to calculate the time
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        DownloadOptions options = new DownloadOptions();
        options.DisableContentMD5Validation = true;

        //use these lines of code just for checking the downloading progress, you can remove it in your code.
        SingleTransferContext context = new SingleTransferContext();
        context.ProgressHandler = new Progress<TransferStatus>((progress) =>
        {
            Console.WriteLine("Bytes downloaded: {0}", progress.BytesTransferred);
        });

        var task = TransferManager.DownloadAsync(cloudBlockBlob, stream,options,context);
        task.Wait();

        stopwatch.Stop();
        Console.WriteLine("the length of the stream is: "+stream.Length);
        Console.WriteLine("the time is taken: "+stopwatch.ElapsedMilliseconds);

The test result:

这篇关于如何提高通过流下载大型天蓝色 blob 文件的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆