如何在不使用物理文件路径的情况下在blob容器之间解压缩大型zip文件 [英] How to unzip large size zip files between blob containers without using physical filepaths
问题描述
我们有一个要求,将Blob容器中的大.zip文件(大约3-4 GB大小)提取到其他Blob容器中,并且提取的文件是Jason文件(大约35 -50GB大小).
We have one requirement to extract large size .zip files (around 3 - 4 GB size) in Blob Container to other Blob Container and the extracted files are Jason files (around 35 -50GB size).
有关实现的信息,请参见以下链接: https://msdevzone.wordpress.com/2017/07/07/extract-a-zip-file-stored-in-azure-blob/并能够提取较小尺寸的文件在几分钟内将40MB解压缩到400MB,但由于将2 GB的文件大小提取到30GB的JSON文件中而导致卡住一个多小时.
For implementation been referred code from this link: https://msdevzone.wordpress.com/2017/07/07/extract-a-zip-file-stored-in-azure-blob/ and able to extract files lesser sizes 40MB unzipping to 400MB in few minutes but getting stuck more than an hour with 2 GB file sizes extracting to 30GB JSON files.
有人可以建议他们在不使用文件操作的情况下遇到任何更好的解决方案吗?
Could anyone suggest whether any better solution they come across this scenario not using file operations?
请在下面的代码参考中进行研究:
Please below code reference we worked on:
CloudBlockBlob blockBlob = container.GetBlockBlobReference(filename);
BlobRequestOptions options = new BlobRequestOptions();
options.ServerTimeout = new TimeSpan(0, 20, 0);
// Save blob(zip file) contents to a Memory Stream.
using (MemoryStream zipBlobFileStream = new MemoryStream())
{
//blockBlob.Properties.LeaseDuration
blockBlob.DownloadToStream(zipBlobFileStream, null, options);
zipBlobFileStream.Flush();
zipBlobFileStream.Position = 0;
//use ZipArchive from System.IO.Compression to extract all the files from zip file
using (ZipArchive zip = new ZipArchive(zipBlobFileStream, ZipArchiveMode.Read, true))
{
//Each entry here represents an individual file or a folder
foreach (var entry in zip.Entries)
{
//creating an empty file (blobkBlob) for the actual file with the same name of file
var blob = extractcontainer.GetBlockBlobReference(entry.FullName);
using (var stream = entry.Open())
{
//check for file or folder and update the above blob reference with actual content from stream
if (entry.Length > 0)
blob.UploadFromStream(stream);
}
}
}
}
推荐答案
使用Azure存储文件共享,这是它对我有用的唯一方法,无需将整个ZIP加载到内存中.我测试了一个3GB的ZIP文件(其中包含成千上万个文件或一个大文件),并且内存/CPU较低且稳定.也许您可以适应BlockBlob.希望对您有所帮助!
Using Azure Storage File Share this is the only way it worked for me without loading the entire ZIP into Memory. I tested with a 3GB ZIP File (with thousands of files or with a big file inside) and Memory/CPU was low and stable. Maybe you can adapt to BlockBlobs. I hope it helps!
var zipFiles = _directory.ListFilesAndDirectories()
.OfType<CloudFile>()
.Where(x => x.Name.ToLower().Contains(".zip"))
.ToList();
foreach (var zipFile in zipFiles)
{
using (var zipArchive = new ZipArchive(zipFile.OpenRead()))
{
foreach (var entry in zipArchive.Entries)
{
if (entry.Length > 0)
{
CloudFile extractedFile = _directory.GetFileReference(entry.Name);
using (var entryStream = entry.Open())
{
byte[] buffer = new byte[16 * 1024];
using (var ms = extractedFile.OpenWrite(entry.Length))
{
int read;
while ((read = entryStream.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
}
}
}
}
}
}
这篇关于如何在不使用物理文件路径的情况下在blob容器之间解压缩大型zip文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!