使用zip存档在内存中处理zip文件的性能 [英] Performance of handling zip file in memory using zip archive

查看:66
本文介绍了使用zip存档在内存中处理zip文件的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


我正在使用Zip存档[System.IO.Compression]收集多个xml内存流并转换为zip内存流然后发送到远程服务器[我存储zip文件]。



我的意图是不将任何文件[包括xml文件,zip文件]存储在本地机器中,

而不是使用任何第三方dll压缩。



下面的代码工作正常,并按照我的预期给出结果。但我仍然担心性能[在生产环境中,我们将获得更多的拉链存储记录]



请在性能和内存泄漏方面建议最佳解决方案。



我尝试过:



转换列表的扩展方法<任何对象>到XML

Hi I am using Zip archive[System.IO.Compression] for collect the multiple xml memory stream and convert into zip memory stream then sent to remote server [where I am storing zip file].

My intention is not to store any files[including xml file,zip file] in local machine,
and not to use any 3rd party dll for compression.

Below code is working fine and giving result as I expected. But still i am worrying about performance[In production env, we will get more records to zip storage]

kindly advise the best solution in terms of performance and memory leak.

What I have tried:

Extension Method to convert List<any object> into XML

/// <summary>
       /// Get the List of T and serialize into XML Memeory stream
       /// </summary>
       /// <typeparam name="T"></typeparam>
       /// <param name="dataToSerialize"></param>
       /// <returns></returns>
       public static MemoryStream Serialize<T>(this T dataToSerialize)
       {
           try
           {
               if (dataToSerialize == null) throw new ArgumentNullException();
               var serializer = new XmlSerializer(dataToSerialize.GetType());
               MemoryStream memstream = new MemoryStream();
               serializer.Serialize(memstream, dataToSerialize);

               return memstream;

           }
           catch
           {
               throw;
           }

       }





转换为zip内存流的扩展方法不同记录列表。

输入值如dir [xmlfilename,List< xml转换的对象数据>]





Extension method to convert into zip memory stream from List of different record.
Input value like dir["xmlfilename",List<object data for xml conversion>]

/// <summary>
     /// Combine all file memory stream and convert into Zip memory stream
     /// </summary>
     /// <typeparam name="T"></typeparam>
     /// <param name="dataToSerialize"></param>
     /// <returns></returns>
     public static MemoryStream SerializeIntoZip<T>(this Dictionary<string,T> dataToSerialize)
     {
         var outStream = new MemoryStream();
         try
         {
             if (dataToSerialize == null) throw new ArgumentNullException();
             using (var archive = new ZipArchive(outStream, ZipArchiveMode.Create, true))
             {
                 foreach(var data in dataToSerialize)
                 {
                     var fileInArchive = archive.CreateEntry(data.Key, CompressionLevel.Optimal);
                     using (var entryStream = fileInArchive.Open())
                     {

                         using (var fileToCompressStream = data.Value.Serialize()) // Calling existing file stream method
                         {
                             entryStream.Flush();
                             fileToCompressStream.Seek(0, SeekOrigin.Begin);
                             fileToCompressStream.CopyTo(entryStream);
                             fileToCompressStream.Flush();
                         }
                     }
                 }

             }

             outStream.Seek(0, SeekOrigin.Begin);
             return outStream;
         }
         catch
         {

             throw;
         }


     }





我的主要方法



My main method

Dictionary<string, IList> listofobj =
                new Dictionary<string, IList>() {
                   {"fileName_XX.xml", GetXXList1()}, 
                   {"fileName_YY.xml", GetYYList()}};

var mem = listofobj.SerializeIntoZip();

//{
// code to upload zip memory stream to S3 bucket
//}

推荐答案

通常不是RAM机器上的问题,而是磁盘I / O等你应该优化磁盘的读写操作。创建一个文件I / O线程和一个数据压缩线程是有意义的,因此这两个操作都不会伤害彼此的性能。文件I / O一次只能在一个文件上工作,它应该先写一个待处理文件,然后再写一个文件。内存流比获取压缩线程和I / O线程可以读取下一个文件。对于远程传输是一些非常有用的缓冲,但检查空闲内存。



最好对它进行一些实际测试以找到一些优化。例如,如果文件小于读取,则压缩并将一堆文件写入临时文件以便稍后传输。
Normally isnt the RAM the problem on the machines, but the disk I/O and so you should optimize the disk read and write operations. It would make sense to make a file I/O thread and a data compression thread, so both operation wont hurt each other performance. The file I/O should only work on one file a time, it should first write one pending and than read one ahead. The memory stream than gets the compressing thread and the I/O thread can read the next file. For the remote transfer is some buffering very useful, but check the free memory.

At best you make some realistic tests of it to find some optimization. For instance if the files are small than read, compress and write a bunch of files to a temp to transfer them later.


这篇关于使用zip存档在内存中处理zip文件的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆