适用于解析大型数据文件的Java数据结构 [英] Suitable Java data structure for parsing large data file
问题描述
我想解析一个很大的文本文件(约400万行),我正在寻找有关存储数据的合适数据结构的建议。该文件包含以下行:
日期时间值
2011-11-30 09:00 10
2011-11-30 09:15 5
2011-12-01 12:42 14
2011-12-01 19:58 19
2011-12-01 02:03 12
我想按日期对行进行分组,所以我最初的想法是使用 TreeMap< String,List< String>>
将该日期映射到该行的其余部分,但它是 TreeMap
的列出
可笑的事吗?我想我可以用日期对象替换String键(以消除太多的字符串比较),但这是 List
,因为我担心这可能不合适。 / p>
我使用的是 TreeMap
,因为我想按日期顺序迭代键。
列表的树图是一件可笑的事情吗?
从概念上讲不是,但是它会导致内存效率非常低(因为 Map
和列表
)。您的开销为200%或更多。取决于您必须浪费多少内存,这可能是可接受的,也可能是不可接受的。 (包括 Date
),将所有内容放入 List
并在您需要时对其进行排序(最好使用quicksort)完成阅读。
I have a rather large text file (~4m lines) I'd like to parse and I'm looking for advice about a suitable data structure in which to store the data. The file contains lines like the following:
Date Time Value
2011-11-30 09:00 10
2011-11-30 09:15 5
2011-12-01 12:42 14
2011-12-01 19:58 19
2011-12-01 02:03 12
I want to group the lines by date so my initial thought was to use a TreeMap<String, List<String>>
to map the date to the rest of the line but is a TreeMap
of List
s a ridiculous thing to do? I suppose I could replace the String key with a date object (to eliminate so many string comparisons) but it's the List
as a value that I'm worried might be unsuitable.
I'm using a TreeMap
because I want to iterate the keys in date order.
is a TreeMap of Lists a ridiculous thing to do?
Conceptually not, but it is going to be very memory-inefficient (both because of the Map
and because of the List
). You're looking at an overhead of 200% or more. Which may or may not be acceptable, depending on how much memory you have to waste.
For a more memory-efficient solution, create a class that has fields for every column (including a Date
), put all those in a List
and sort it (ideally using quicksort) when you're done reading.
这篇关于适用于解析大型数据文件的Java数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!