从Azure Blob修改最新文件 [英] Getting the latest file modified from Azure Blob
问题描述
假设我每天在blob存储中生成几个json
文件.我要做的是在我的任何目录中修改最新文件.所以我的blob中会有这样的东西:
2016/01/02/test.json
2016/01/02/test2.json
2016/02/03/test.json
我想得到2016/02/03/test.json
.因此,一种方法是获取文件的完整路径,并进行正则表达式检查以查找创建的最新目录,但是如果每个目录中都有多个josn
文件,则此方法将无效.是否有类似File.GetLastWriteTime
的文件来获取最新的修改文件?
我正在使用这些代码来获取所有文件:
public static CloudBlobContainer GetBlobContainer(string accountName, string accountKey, string containerName)
{
CloudStorageAccount storageAccount = new CloudStorageAccount(new StorageCredentials(accountName, accountKey), true);
// blob client
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// container
CloudBlobContainer blobContainer = blobClient.GetContainerReference(containerName);
return blobContainer;
}
public static IEnumerable<IListBlobItem> GetBlobItems(CloudBlobContainer container)
{
IEnumerable<IListBlobItem> items = container.ListBlobs(useFlatBlobListing: true);
return items;
}
public static List<string> GetAllBlobFiles(IEnumerable<IListBlobItem> blobs)
{
var listOfFileNames = new List<string>();
foreach (var blob in blobs)
{
var blobFileName = blob.Uri.Segments.Last();
listOfFileNames.Add(blobFileName);
}
return listOfFileNames;
}
每个IListBlobItem都将是CloudBlockBlob,CloudPageBlob或CloudBlobDirectory.
在强制转换为块或页面Blob或它们的共享基类CloudBlob
之后(最好使用as
关键字并检查是否为空),您可以通过blockBlob.Properties.LastModified
访问修改后的日期.
请注意,您的实现将对容器中的所有Blob进行O(n)扫描,如果有成千上万个文件,则可能需要一段时间.但是,目前尚无法进行更有效的Blob存储查询(除非您滥用文件命名并以一种新的日期按字母顺序排在第一位的方式对日期进行编码).实际上,如果您需要更好的查询性能,建议您随手携带一个数据库表,该表将所有文件列表表示为行,并使用诸如索引的DateModified列作为搜索依据,以及带有Blob路径的列以方便访问文件./p>
Say I am generating a couple of json
files each day in my blob storage. What I want to do is to get the latest file modified in any of my directories. So I'd have something like this in my blob:
2016/01/02/test.json
2016/01/02/test2.json
2016/02/03/test.json
I want to get 2016/02/03/test.json
. So one way is getting the full path of the file and do a regex checking to find the latest directory created, but this doesn't work if I have more than one josn
file in each dir. Is there anything like File.GetLastWriteTime
to get the latest modified file?
I am using these codes to get all the files btw:
public static CloudBlobContainer GetBlobContainer(string accountName, string accountKey, string containerName)
{
CloudStorageAccount storageAccount = new CloudStorageAccount(new StorageCredentials(accountName, accountKey), true);
// blob client
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// container
CloudBlobContainer blobContainer = blobClient.GetContainerReference(containerName);
return blobContainer;
}
public static IEnumerable<IListBlobItem> GetBlobItems(CloudBlobContainer container)
{
IEnumerable<IListBlobItem> items = container.ListBlobs(useFlatBlobListing: true);
return items;
}
public static List<string> GetAllBlobFiles(IEnumerable<IListBlobItem> blobs)
{
var listOfFileNames = new List<string>();
foreach (var blob in blobs)
{
var blobFileName = blob.Uri.Segments.Last();
listOfFileNames.Add(blobFileName);
}
return listOfFileNames;
}
Each IListBlobItem is going to be a CloudBlockBlob, a CloudPageBlob, or a CloudBlobDirectory.
After casting to block or page blob, or their shared base class CloudBlob
(preferably by using the as
keyword and checking for null), you can access the modified date via blockBlob.Properties.LastModified
.
Note that your implementation will do an O(n) scan over all blobs in the container, which can take a while if there are hundreds of thousands of files. There's currently no way of doing a more efficient query of blob storage though, (unless you abuse the file naming and encode the date in such a way that newer dates alphabetically come first). Realistically if you need better query performance I'd recommend keeping a database table handy that represents all the file listings as rows, with things like an indexed DateModified column to search by and a column with the blob path for easy access to the file.
这篇关于从Azure Blob修改最新文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!