从Windows Azure中的Blob存储逐行读取 [英] Reading line by line from blob Storage in Windows Azure

查看:43
本文介绍了从Windows Azure中的Blob存储逐行读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以从Windows Azure的blob存储中的文本文件中逐行读取内容吗?

Is there any way to read line by line from a text file in the blob storage in windows Azure??

谢谢

推荐答案

是的,您可以对流执行此操作,虽然请仔细阅读(答案的末尾),但这不一定要求您拉出整个文件....而不是有问题的文件),因为您可能还是要提取整个文件.

Yes, you can do this with streams, and it doesn't necessarily require that you pull the entire file, though please read to the end (of the answer... not the file in question) because you may want to pull the whole file anyway.

这是代码:

StorageCredentialsAccountAndKey credentials = new StorageCredentialsAccountAndKey(
    "YourStorageAccountName",
    "YourStorageAccountKey"
);
CloudStorageAccount account = new CloudStorageAccount(credentials, true);
CloudBlobClient client = new CloudBlobClient(account.BlobEndpoint.AbsoluteUri, account.Credentials);
CloudBlobContainer container = client.GetContainerReference("test");

CloudBlob blob = container.GetBlobReference("CloudBlob.txt");
using (var stream = blob.OpenRead())
{
    using (StreamReader reader = new StreamReader(stream))
    {
        while (!reader.EndOfStream)
        {
            Console.WriteLine(reader.ReadLine());
        }
    }
}

我将一个名为CloudBlob.txt的文本文件上传到了一个名为test的容器中.该文件的大小约为1.37 MB(我实际上使用了来自GitHub的CloudBlob.cs文件复制了六到七次).我使用BlockBlob进行了尝试,由于您正在谈论文本文件,因此可能会处理该问题.

I uploaded a text file called CloudBlob.txt to a container called test. The file was about 1.37 MB in size (I actually used the CloudBlob.cs file from GitHub copied into the same file six or seven times). I tried this out with a BlockBlob which is likely what you'll be dealing with since you are talking about a text file.

这通常获得对BLOB的引用,然后我将一个参考我发现重试策略也可以在最后一个读取级别上使用,因此它不会尝试重新读取整个内容,仅是最后一个失败的请求.在这里引用:

This gets a reference to the BLOB as usualy, then I call the OpenRead() method off the CloudBlob object which returns you a BlobStream that you can then wrap in a StreamReader to get you the ReadLine method. I ran fiddler with this and noticed that it ended up calling up to get additional blocks three times to complete the file. It looks like the BlobStream has a few properties and such you can use to tweak the amount of reading ahead you have to do, but I didn't try adjusting them. According to one reference I found the retry policy also works at the last read level, so it won't attempt to re-read the whole thing again, just the last request that failed. Quoted here:

最后,DownloadToFile/ByteArray/Stream/Text()方法在单个流式传输中完成了整个下载过程.如果使用CloudBlob.OpenRead()方法,它将利用BlobReadStream抽象化,该抽象化将在消耗时一次下载一个Blob.如果发生连接错误,则仅需要重新下载一个块(根据配置的RetryPolicy).同样,由于客户端可能不需要在本地缓存大量数据,因此这将潜在地帮助提高性能.对于较大的Blob,这可能会很有帮助,但是请注意,您将针对该服务执行更多的整体事务.-乔·贾迪诺

Lastly, the DownloadToFile/ByteArray/Stream/Text() methods performs it’s entire download in a single streaming get. If you use CloudBlob.OpenRead() method it will utilize the BlobReadStream abstraction which will download the blob one block at a time as it is consumed. If a connection error occurs, then only that one block will need to be re-downloaded(according to the configured RetryPolicy). Also, this will potentially help improve performance as the client may not need cache a large amount of data locally. For large blobs this can help significantly, however be aware that you will be performing a higher number of overall transactions against the service. -- Joe Giardino

我认为重要的是要注意Joe指出的警告,因为这将导致针对您的存储帐户的交易总数增加.但是,根据您的要求,这仍然可能是您正在寻找的选项.

I think it is important to note the caution that Joe points out in that this will lead to an overall larger number of transactions against your storage account. However, depending on your requirements this may still be the option you are looking for.

如果这些文件很大,并且您要执行很多操作,那么它可以进行很多很多事务(尽管您可以查看是否可以调整BlobStream上的属性以增加一次检索到的块的数量,等等).在CloudBlob上执行DownloadFromStream(这将拉下所有内容),然后以与我上面相同的方式从该流中读取,仍然有意义.

If these are massive files and you are doing a lot of this then it could many, many transactions (though you could see if you can tweak the properties on the BlobStream to increase the amount of blocks retrieved at a time, etc). It may still make sense to do a DownloadFromStream on the CloudBlob (which will pull the entire contents down), then read from that stream the same way I did above.

唯一真正的区别是,一个一次一次提取较小的块,另一个一次立即提取整个文件.每个文件都有优点和缺点,这在很大程度上取决于这些文件的大小以及您是否打算在读取文件的中间停下来(例如是的,我找到了我要搜索的字符串!")或者无论如何计划读取整个文件.如果无论如何都要提取整个文件(例如,因为您正在处理整个文件),则只需使用DownloadToStream并将其包装在StreamReader中即可.

The only real difference is that one is pulling smaller chunks at a time and the other is pulling the full file immediately. There are pros and cons for each and it will depend heavily on how large these files are and if you plan on stopping at some point in the middle of reading the file (such as "yeah, I found the string I was searching for!) or if you plan on reading the entire file anyway. If you plan on pulling the whole file no matter what (because you are processing the entire file for example), then just use the DownloadToStream and wrap that in a StreamReader.

注意:我使用1.7 SDK进行了尝试.我不确定这些选项是在哪个SDK中引入的.

Note: I tried this with the 1.7 SDK. I'm not sure which SDK these options were introduced.

这篇关于从Windows Azure中的Blob存储逐行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆