从ZipArchive并行读取 [英] Parallel reads from a ZipArchive
问题描述
所以很有可能我想做的事情是不可能的,但我正在玩,看看是否可以并行读取ZipArchive中的文件。我把一个简单的测试工具放在一起,它的工作原理......但是我认为从底层流中读取
时仍会遇到竞争条件。与Stream.Synchronize同步改善了问题,但我仍然遇到竞争条件。我正在摸索着找到我必须在这里失踪的东西。任何想法?
So its quite possible that what I am trying to do just isn't possible but I was playing around to see if could read files out of a ZipArchive in parallel. I threw together a simple test harness and it works... kind of but still hits race conditions on reads from what I assume is the underlying stream. Synchronizing that with Stream.Synchronize improved matters but I still hit race conditions. I am scratching my head to find what I must be missing here. Any Ideas?
示例代码(框架4.7.1):
The sample code (Framework 4.7.1):
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Net;
using System.Threading;
using System.Threading.Tasks;
namespace AsyncZipStream
{
class Program
{
static void Main(string[] args)
{
var prog = new Program();
prog.Run();
}
private void Run()
{
var url = new Uri(
string.Format("file:///{0}/Data/precos_2016-12-07T110802Z.zip",
Directory.GetCurrentDirectory()));
var request = WebRequest.Create(url);
using (var response = request.GetResponse())
{
var archive = new ZipArchive(
Stream.Synchronized(response.GetResponseStream()), ZipArchiveMode.Read, true);
var workQueue = new BlockingCollection<ZipArchiveEntry>();
foreach (var entry in archive.Entries)
{
workQueue.Add(entry);
}
workQueue.CompleteAdding();
var workers = new List<Task>();
var maxWorkers = 6;
for (var i = 0; i < maxWorkers; i++)
{
workers.Add(Task.Factory.StartNew(() => { DoWork(workQueue); }));
}
foreach (var worker in workers) worker.Wait();
}
Console.WriteLine("\nPress any key to exit.");
Console.ReadKey();
}
private void DoWork(BlockingCollection<ZipArchiveEntry> workQueue)
{
while (workQueue.TryTake(out var entry)) DoEntry(entry);
}
private void DoEntry(ZipArchiveEntry entry)
{
try
{
var reader = new StreamReader(entry.Open());
var chrCount = 0;
int chr;
while ((chr = reader.Read()) >= 0)
{
chrCount++;
}
Console.WriteLine("Thread {0} read {1} chars from {2}",
Thread.CurrentThread.ManagedThreadId, chrCount, entry.FullName);
}
catch (Exception ex)
{
Console.WriteLine("Thread {0} Error reading {1} exception: {2}",
Thread.CurrentThread.ManagedThreadId, entry.FullName, ex.Message);
}
}
}
}
推荐答案
嗨JMarpuiss,
Hi JMarpuiss,
感谢您在此发帖。
对于您的问题,我对您回复中的以下代码感到困惑。
For your question, I am confused about the codes below in your reply.
var request = WebRequest.Create(url);
using (var response = request.GetResponse())
根据您的描述,您想要从zip文件夹中的文件中读取数据。我举了一个简单的例子供你参考。我创建了一个包含四个.txt文件N1,N2,N3,N4的zip文件。我不确定你是否想要精确的zip文件。我使用
parallel直接读取数据。
According to your description, you want to read data from files in zip folder. I make a simple example for your reference. I create a zip file with four .txt files N1, N2, N3, N4. I am not sure you want exact the zip file or not. I read data directly using parallel.
文件中的所有文本都是N1,N2,N3,N4,其名称与.txt文件相同。
All the text in the file is N1, N2, N3, N4 with the same name of .txt file.
public static void readFromArchive()
{
using (ZipArchive zipArchive = ZipFile.Open(@"Test.zip", ZipArchiveMode.Read))
{
Parallel.ForEach(zipArchive.Entries, (entry) =>
{
using (StreamReader stream = new StreamReader(entry.Open()))
{
Console.WriteLine(stream.ReadToEnd() + "\t\t" + "Thread ID:" + Thread.CurrentThread.ManagedThreadId + "\t\t" + "Start Time:" + DateTime.Now);
}
});
}
}
在屏幕截图中,您可以看到文件的文本没有按顺序读取。
In the screenshot, you could see that the text of file do not read in order.
最诚挚的问候,
Wendy
这篇关于从ZipArchive并行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!