适用于解析大型数据文件的Java数据结构 [英] Suitable Java data structure for parsing large data file

查看:95
本文介绍了适用于解析大型数据文件的Java数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析一个很大的文本文件(约400万行),我正在寻找有关存储数据的合适数据结构的建议。该文件包含以下行:

 日期时间值
2011-11-30 09:00 10
2011-11-30 09:15 5
2011-12-01 12:42 14
2011-12-01 19:58 19
2011-12-01 02:03 12

我想按日期对行进行分组,所以我最初的想法是使用 TreeMap< String,List< String>> 将该日期映射到该行的其余部分,但它是 TreeMap 列出可笑的事吗?我想我可以用日期对象替换String键(以消除太多的字符串比较),但这是 List ,因为我担心这可能不合适。 / p>

我使用的是 TreeMap ,因为我想按日期顺序迭代键。

解决方案


列表的树图是一件可笑的事情吗?


从概念上讲不是,但是它会导致内存效率非常低(因为 Map 列表)。您的开销为200%或更多。取决于您必须浪费多少内存,这可能是可接受的,也可能是不可接受的。 (包括 Date ),将所有内容放入 List 并在您需要时对其进行排序(最好使用quicksort)完成阅读。


I have a rather large text file (~4m lines) I'd like to parse and I'm looking for advice about a suitable data structure in which to store the data. The file contains lines like the following:

Date        Time    Value
2011-11-30  09:00   10
2011-11-30  09:15   5
2011-12-01  12:42   14
2011-12-01  19:58   19
2011-12-01  02:03   12

I want to group the lines by date so my initial thought was to use a TreeMap<String, List<String>> to map the date to the rest of the line but is a TreeMap of Lists a ridiculous thing to do? I suppose I could replace the String key with a date object (to eliminate so many string comparisons) but it's the List as a value that I'm worried might be unsuitable.

I'm using a TreeMap because I want to iterate the keys in date order.

解决方案

is a TreeMap of Lists a ridiculous thing to do?

Conceptually not, but it is going to be very memory-inefficient (both because of the Map and because of the List). You're looking at an overhead of 200% or more. Which may or may not be acceptable, depending on how much memory you have to waste.

For a more memory-efficient solution, create a class that has fields for every column (including a Date), put all those in a List and sort it (ideally using quicksort) when you're done reading.

这篇关于适用于解析大型数据文件的Java数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆