从大二进制文件读取字节> 2GB [英] Read Bytes from Large Binary file >2GB

查看:58
本文介绍了从大二进制文件读取字节> 2GB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨我想从大文件中读取一个字节> 2GB

当我这样做时,抛出了OutofMemory异常,因为我把整个文件读到了一个内存,所有我知道的是我可以将文件分成小块...

那么最好的代码是什么?



读取文件的原因,是找到存储在文件中的一些字节。



任何建议都会非常感激。

解决方案

除了Espen Harlinn的正确答案:



将文件分成块几乎无法帮助你,除非这些块具有不同的性质(不同的格式,代表不同的数据)结构),所以他们被放在一个文件中没有正当理由。



在其他情况下,最好使用大文件并保持打开状态。有些情况下,您需要将文件分成两部分。这只是基本的想法;见下文。



所以,我认为文件很大只是因为它代表了相同类型或几种不同类型的对象的集合。如果所有项目具有相同的大小(在文件存储单元中),那么寻址是微不足道的:您只需要将项目的大小乘以所需的索引,以获得的位置参数Stream.Seek 。因此,唯一的重要案例是当您拥有不同大小的项目集合时。如果是这种情况,您应该索引该文件并构建索引表。索引表将包含相同大小的单元,通常是每个索引的文件位置列表/数组。由于这个事实,索引表的寻址可以通过索引(移位)来完成,如上所述,然后你读取大文件的位置,移动文件位置并读取数据。



您将有2个选项:1)将索引表保存在内存中;你每次都可以重新计算;但最好一次(缓存)并将其保存在某个文件中,相同或单独的文件; 2)将其放在文件中并在所需位置读取该文件。这样,您将必须分两步寻找文件中的位置。原则上,此方法允许您访问任何大小的文件(仅限于 System.Uint64.MaximumValue )。



在大文件的流中定位后,您可以阅读单个项目。您可以使用序列化来实现此目的。请参阅:

http://en.wikipedia.org/wiki/Serialization#.NET_Framework [ ^ ],

http://msdn.microsoft.com/en-us/library/vstudio/ ms233843.aspx [ ^ ],

http: //msdn.microsoft.com/en-us/library/system.runtime.serialization.formatters.binary.binaryformatter.aspx [ ^ ]。



实现所有解决方案的一种奇特方式索引表将使用索引属性将其全部封装在类中。



-SA

看看:

FileStream.Read [ ^ ]

FileStream.Seek [ ^ ]



这几乎涵盖了你需要知道的内容。



< b> [更新]

您的实现应该看起来像这样:

 const int megabyte = 1024 * 1024; 

public void ReadAndProcessLargeFile(string theFilename,long whereToStartReading = 0)
{
FileStream fileStram = new FileStream(theFilename,FileMode.Open,FileAccess.Read);
using(fileStram)
{
byte [] buffer = new byte [megabyte];
fileStram.Seek(whereToStartReading,SeekOrigin.Begin);
int bytesRead = fileStram.Read(buffer,0,megabyte);
while(bytesRead> 0)
{
ProcessChunk(buffer,bytesRead);
bytesRead = fileStram.Read(buffer,0,megabyte);
}

}
}

private void ProcessChunk(byte [] buffer,int bytesRead)
{
//在这里处理
}





祝你好运

Espen Harlinn


Hi I want to read a bytes from large file >2GB
When i do that OutofMemory exception is thrown, cause i read the whole file to a memory, all I know is that I can chunk the file into small pieces...
So what is the best code to do that?

Reason for reading the file, is to find some bytes that stored in the file.

Any suggestion will be really appreciated.

解决方案

In addition to the correct answer by Espen Harlinn:

Breaking a file into chunks will hardly help you, unless those chunks are of different natures (different formats, representing different data structures), so they were put in one file without proper justification.

In other cases, it's good to use the big file and keep it open. There are cases when you need to split the file in two pieces. This is just the basic idea; see below.

So, I would assume that the file is big just because it represent a collection of object of the same type or few different types. If all the items are of the same size (in file storage units), addressing then is trivial: you simply need to multiply the size by required index of the item, to get a position parameter for Stream.Seek. So, the only non-trivial case is when you have a collection of items of different size. If this is the case, you should index the file and build the index table. The index table will consist of the units of the same size, which is typically the list/array of file positions per index. Due to this fact, addressing to the index table can be done by index (shift), as described above, and then you read position of the "big" file, move file position there and read data.

You will have 2 options: 1) keep index table in memory; you can recalculate it each time; but it's better to do it once (cache) and to keep it in some file, the same or a separate one; 2) to have it in a file and read this file at required position. This way, you will have to seek the position in the file(s) in two steps. In principle, this method will allow you to access files of any size (limited only by System.Uint64.MaximumValue).

After you position in a stream of a "big" file, you can read a single item. You can use serialization for this purpose. Please see:
http://en.wikipedia.org/wiki/Serialization#.NET_Framework[^],
http://msdn.microsoft.com/en-us/library/vstudio/ms233843.aspx[^],
http://msdn.microsoft.com/en-us/library/system.runtime.serialization.formatters.binary.binaryformatter.aspx[^].

A fancy way of implementing all the solutions with index table would be encapsulating it all in the class with indexed property.

—SA


Have a look at:
FileStream.Read[^]
FileStream.Seek[^]

That pretty much covers what you need to know.

[Update]
Your implementation should look a bit like this:

const int megabyte = 1024 * 1024;

public void ReadAndProcessLargeFile(string theFilename, long whereToStartReading = 0)
{
    FileStream fileStram = new FileStream(theFilename,FileMode.Open,FileAccess.Read);
    using (fileStram)
    {
        byte[] buffer = new byte[megabyte];
        fileStram.Seek(whereToStartReading, SeekOrigin.Begin);
        int bytesRead = fileStram.Read(buffer, 0, megabyte);
        while(bytesRead > 0)
        {
            ProcessChunk(buffer, bytesRead);
            bytesRead = fileStram.Read(buffer, 0, megabyte);
        }

    }
}

private void ProcessChunk(byte[] buffer, int bytesRead)
{
    // Do the processing here
}



Best regards
Espen Harlinn


这篇关于从大二进制文件读取字节&gt; 2GB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆