从ZipArchive并行读取 [英] Parallel reads from a ZipArchive

查看:80
本文介绍了从ZipArchive并行读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以很有可能我想做的事情是不可能的,但我正在玩,看看是否可以并行读取ZipArchive中的文件。我把一个简单的测试工具放在一起,它的工作原理......但是我认为从底层流中读取
时仍会遇到竞争条件。与Stream.Synchronize同步改善了问题,但我仍然遇到竞争条件。我正在摸索着找到我必须在这里失踪的东西。任何想法?

So its quite possible that what I am trying to do just isn't possible but I was playing around to see if could read files out of a ZipArchive in parallel. I threw together a simple test harness and it works... kind of but still hits race conditions on reads from what I assume is the underlying stream. Synchronizing that with Stream.Synchronize improved matters but I still hit race conditions. I am scratching my head to find what I must be missing here. Any Ideas?

示例代码(框架4.7.1):

The sample code (Framework 4.7.1):

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Net;
using System.Threading;
using System.Threading.Tasks;

namespace AsyncZipStream
{
    class Program
    {
        static void Main(string[] args)
        {
            var prog = new Program();
            prog.Run();
        }

        private void Run()
        {
            var url = new Uri(
                string.Format("file:///{0}/Data/precos_2016-12-07T110802Z.zip",
                    Directory.GetCurrentDirectory()));
            var request = WebRequest.Create(url);
            using (var response = request.GetResponse())
            {
                var archive = new ZipArchive(
                    Stream.Synchronized(response.GetResponseStream()), ZipArchiveMode.Read, true);

                var workQueue = new BlockingCollection<ZipArchiveEntry>();

                foreach (var entry in archive.Entries)
                {
                    workQueue.Add(entry);
                }
                workQueue.CompleteAdding();

                var workers = new List<Task>();
                var maxWorkers = 6;
                for (var i = 0; i < maxWorkers; i++)
                {
                    workers.Add(Task.Factory.StartNew(() => { DoWork(workQueue); }));
                }

                foreach (var worker in workers) worker.Wait();
            }

            Console.WriteLine("\nPress any key to exit.");
            Console.ReadKey();
        }

        private void DoWork(BlockingCollection<ZipArchiveEntry> workQueue)
        {
            while (workQueue.TryTake(out var entry)) DoEntry(entry);
        }

        private void DoEntry(ZipArchiveEntry entry)
        {
            try
            {
                var reader = new StreamReader(entry.Open());
                var chrCount = 0;
                int chr;
                while ((chr = reader.Read()) >= 0)
                {
                    chrCount++;
                }
                Console.WriteLine("Thread {0} read {1} chars from {2}",
                        Thread.CurrentThread.ManagedThreadId, chrCount, entry.FullName);
            }
            catch (Exception ex)
            {
                Console.WriteLine("Thread {0} Error reading {1} exception: {2}",
                        Thread.CurrentThread.ManagedThreadId, entry.FullName, ex.Message);
            }
        }
    }
}

推荐答案

嗨JMarpuiss,

Hi JMarpuiss,

感谢您在此发帖。

对于您的问题,我对您回复中的以下代码感到困惑。

For your question, I am confused about the codes below in your reply.

var request = WebRequest.Create(url);
            using (var response = request.GetResponse())




根据您的描述,您想要从zip文件夹中的文件中读取数据。我举了一个简单的例子供你参考。我创建了一个包含四个.txt文件N1,N2,N3,N4的zip文件。我不确定你是否想要精确的zip文件。我使用
parallel直接读取数据。

According to your description, you want to read data from files in zip folder. I make a simple example for your reference. I create a zip file with four .txt files N1, N2, N3, N4. I am not sure you want exact the zip file or not. I read data directly using parallel.

文件中的所有文本都是N1,N2,N3,N4,其名称与.txt文件相同。

All the text in the file is N1, N2, N3, N4 with the same name of .txt file.

 public static void readFromArchive()
        {
            using (ZipArchive zipArchive = ZipFile.Open(@"Test.zip", ZipArchiveMode.Read))
            {
                Parallel.ForEach(zipArchive.Entries, (entry) =>
                {
                    using (StreamReader stream = new StreamReader(entry.Open()))
                    {
                        Console.WriteLine(stream.ReadToEnd() + "\t\t" + "Thread ID:" + Thread.CurrentThread.ManagedThreadId + "\t\t" + "Start Time:" + DateTime.Now);
                    }
                });

            }
      }

在屏幕截图中,您可以看到文件的文本没有按顺序读取。

In the screenshot, you could see that the text of file do not read in order.

最诚挚的问候,

Wendy


这篇关于从ZipArchive并行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆