将非常大的二进制文件逐步转换为Base64String [英] Convert a VERY LARGE binary file into a Base64String incrementally

查看:620
本文介绍了将非常大的二进制文件逐步转换为Base64String的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要帮助,将非常大的二进制文件(ZIP文件)转换为Base64String,然后再次返回.这些文件太大,无法一次全部加载到内存中(它们会抛出OutOfMemoryExceptions),否则这将是一个简单的任务.我不想单独处理ZIP文件的内容,我想处理整个ZIP文件.

I need help converting a VERY LARGE binary file (ZIP file) to a Base64String and back again. The files are too large to be loaded into memory all at once (they throw OutOfMemoryExceptions) otherwise this would be a simple task. I do not want to process the contents of the ZIP file individually, I want to process the entire ZIP file.

问题:

我可以将整个ZIP文件(当前测试大小从1 MB更改为800 MB)转换为Base64String,但是当我将其转换回时,它已损坏.新的ZIP文件大小正确,被Windows和WinRAR/7-Zip等识别为ZIP文件,我什至可以查看ZIP文件的内部并查看具有正确大小/属性的内容,但是当我尝试从ZIP文件中提取内容,但得到:错误:0x80004005",这是一个常规错误代码.

I can convert the entire ZIP file (test sizes vary from 1 MB to 800 MB at present) to Base64String, but when I convert it back, it is corrupted. The new ZIP file is the correct size, it is recognized as a ZIP file by Windows and WinRAR/7-Zip, etc., and I can even look inside the ZIP file and see the contents with the correct sizes/properties, but when I attempt to extract from the ZIP file, I get: "Error: 0x80004005" which is a general error code.

我不确定腐败发生的地点或原因.我进行了一些调查,发现以下内容:

I am not sure where or why the corruption is happening. I have done some investigating, and I have noticed the following:

如果文本文件很大,则可以将其逐步转换为Base64String,而不会出现问题.如果在整个文件上调用Convert.ToBase64String会产生:"abcdefghijklmnopqrstuvwx" ,然后以两段式在文件上调用它将产生:"abcdefghijkl" "mnopqrstuvwx ".

If you have a large text file, you can convert it to Base64String incrementally without issue. If calling Convert.ToBase64String on the entire file yielded: "abcdefghijklmnopqrstuvwx", then calling it on the file in two pieces would yield: "abcdefghijkl" and "mnopqrstuvwx".

不幸的是,如果文件是二进制文件,则结果会有所不同.尽管整个文件可能会产生:"abcdefghijklmnopqrstuvwx" ,但尝试将其分成两部分进行处理会产生类似以下内容:"oiweh87yakgb" "kyckshfguywp" >.

Unfortunately, if the file is a binary then the result is different. While the entire file might yield: "abcdefghijklmnopqrstuvwx", trying to process this in two pieces would yield something like: "oiweh87yakgb" and "kyckshfguywp".

在避免这种损坏的情况下,是否有一种方法可以对64位二进制文​​件进行增量式base 64编码?

Is there a way to incrementally base 64 encode a binary file while avoiding this corruption?

我的代码:

        private void ConvertLargeFile()
        {
           FileStream inputStream  = new FileStream("C:\\Users\\test\\Desktop\\my.zip", FileMode.Open, FileAccess.Read);
           byte[] buffer = new byte[MultipleOfThree];
           int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
           while(bytesRead > 0)
           {
              byte[] secondaryBuffer = new byte[buffer.Length];
              int secondaryBufferBytesRead = bytesRead;
              Array.Copy(buffer, secondaryBuffer, buffer.Length);
              bool isFinalChunk = false;
              Array.Clear(buffer, 0, buffer.Length);
              bytesRead = inputStream.Read(buffer, 0, buffer.Length);
              if(bytesRead == 0)
              {
                 isFinalChunk = true;
                 buffer = new byte[secondaryBufferBytesRead];
                 Array.Copy(secondaryBuffer, buffer, buffer.length);
              }

              String base64String = Convert.ToBase64String(isFinalChunk ? buffer : secondaryBuffer);
              File.AppendAllText("C:\\Users\\test\\Desktop\\Base64Zip", base64String); 
            }
            inputStream.Dispose();
        }

解码更多相同.我将上面的base64String变量的大小(视我测试的原始缓冲区大小而定)用作解码的缓冲区大小.然后,我调用Convert.FromBase64String()而不是Convert.ToBase64String()并写入其他文件名/路径.

The decoding is more of the same. I use the size of the base64String variable above (which varies depending on the original buffer size that I test with), as the buffer size for decoding. Then, instead of Convert.ToBase64String(), I call Convert.FromBase64String() and write to a different file name/path.

我急着减少代码(我将其重构为一个新项目,与其他处理分开以消除不是问题关键的代码),我引入了一个错误.当应使用buffer时,所有迭代均应在secondaryBuffer上执行base 64转换,最后一次除外(由isFinalChunk标识).我已经更正了上面的代码.

In my haste to reduce the code (I refactored it into a new project, separate from other processing to eliminate code that isn't central to the issue) I introduced a bug. The base 64 conversion should be performed on the secondaryBuffer for all iterations save the last (Identified by isFinalChunk), when buffer should be used. I have corrected the code above.

编辑#2:

谢谢大家的评论/反馈.修正了错误之后(请参见上面的编辑),我重新测试了代码,现在它实际上已经可以工作了.我打算测试并实施@rene的解决方案,因为它似乎是最好的,但是我认为我也应该让所有人知道我的发现.

Thank you all for your comments/feedback. After correcting the bug (see the above edit), I re-tested my code, and it is actually working now. I intend to test and implement @rene's solution as it appears to be the best, but I thought that I should let everyone know of my discovery as well.

推荐答案

基于博客(来自 Wiktor Zychla ),以下代码有效. 转换的备注部分中指出了相同的解决方案.由 Ivan Stoev

Based on the code shown in the blog from Wiktor Zychla the following code works. This same solution is indicated in the remarks section of Convert.ToBase64String as pointed out by Ivan Stoev

// using  System.Security.Cryptography

private void ConvertLargeFile()
{
    //encode 
    var filein= @"C:\Users\test\Desktop\my.zip";
    var fileout = @"C:\Users\test\Desktop\Base64Zip";
    using (FileStream fs = File.Open(fileout, FileMode.Create))
        using (var cs=new CryptoStream(fs, new ToBase64Transform(),
                                                     CryptoStreamMode.Write))

           using(var fi =File.Open(filein, FileMode.Open))
           {
               fi.CopyTo(cs);
           }
     // the zip file is now stored in base64zip    
     // and decode
     using (FileStream f64 = File.Open(fileout, FileMode.Open) )
         using (var cs=new CryptoStream(f64, new FromBase64Transform(),
                                                     CryptoStreamMode.Read ) ) 
           using(var fo =File.Open(filein +".orig", FileMode.Create))
           {
               cs.CopyTo(fo);
           }     
     // the original file is in my.zip.orig
     // use the commandlinetool 
     //  fc my.zip my.zip.orig 
     // to verify that the start file and the encoded and decoded file 
     // are the same
}

该代码使用在 System.Security.Cryptography 命名空间,并使用CryptoStream

The code uses standard classes found in System.Security.Cryptography namespace and uses a CryptoStream and the FromBase64Transform and its counterpart ToBase64Transform

这篇关于将非常大的二进制文件逐步转换为Base64String的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆