Java:处理大数据量的建议。 (部分Deux) [英] Java: Advice on handling large data volumes. (Part Deux)
问题描述
好的。所以我有大量的二进制数据(比方说,10GB)分布在不同长度的一堆文件(比方说5000)上。
Alright. So I have a very large amount of binary data (let's say, 10GB) distributed over a bunch of files (let's say, 5000) of varying lengths.
我写的一个Java应用程序来处理这些数据,我希望为数据访问建立一个好的设计。通常会发生这样的事情:
I am writing a Java application to process this data, and I wish to institute a good design for the data access. Typically what will happen is such:
- 在处理过程中,所有数据都会被读取。
- 每个文件(通常)按顺序读取,一次只需几千字节。但是,通常需要同时拥有每个文件的前几千字节,或同时每个文件的中间几千字节等。
- 有时候应用程序会想要随机访问一两个字节。
- One way or another, all the data will be read during the course of processing.
- Each file is (typically) read sequentially, requiring only a few kilobytes at a time. However, it is often necessary to have, say, the first few kilobytes of each file simultaneously, or the middle few kilobytes of each file simultaneously, etc.
- There are times when the application will want random access to a byte or two here and there.
目前我正在使用RandomAccessFile类来读入字节缓冲区(和ByteBuffers)。我的最终目标是将数据访问封装到某个类中,以便它很快,我再也不用担心它了。基本功能是我将要求它从指定文件中读取数据帧,并且我希望在上述考虑因素的情况下最小化I / O操作。
Currently I am using the RandomAccessFile class to read into byte buffers (and ByteBuffers). My ultimate goal is to encapsulate the data access into some class such that it is fast and I never have to worry about it again. The basic functionality is that I will be asking it to read frames of data from specified files, and I wish to minimize the I/O operations given the considerations above.
示例对于典型的访问:
- 给我前10 KB的所有文件!
- 给我文件F的字节0到999,然后给我字节1到1000,然后给我2到1001等等,...
- 从文件F给我一兆字节的数据从这样的字节开始!
对优秀设计的任何建议?
Any suggestions for a good design?
推荐答案
使用Java NIO和MappedByteBuffers,并将文件视为字节数组列表。然后,让操作系统担心缓存,读取,刷新等细节。
Use Java NIO and MappedByteBuffers, and treat your files as a list of byte arrays. Then, let the OS worry about the details of caching, read, flushing etc.
这篇关于Java:处理大数据量的建议。 (部分Deux)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!