Groovy:从文件中读取一系列的行 [英] Groovy: Reading a range of lines from file

查看:354
本文介绍了Groovy:从文件中读取一系列的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个有大约200万行数据量的文本文件。使用下面的代码片段阅读文件很容易,但这不是我所需要的; - )

  def f = new File input.txt)
f.eachLine(){
//这里的一些代码
}

我只需要从文件中读取特定范围的行。有没有一种方法来指定像这样的开始和结束行(伪代码)?我想避免使用readLines()将所有行加载到内存中,然后再选择范围。

  //读取所有行4到48 
def f = new File(input.txt)
def start = 4
def end = 48
f.eachLine(start,end){
//这里有一些代码





如果Groovy不能实现任何Java解决方案也是受欢迎的: - )



干杯,
罗伯特

解决方案我不相信有什么魔法的方式跳到文件中的任意行。行只是由换行符来定义的,所以在没有真正阅读文件的情况下,就没有办法知道它们在哪里。我相信你有两个选择:


  1. 按照Mark Peter的回答,使用 BufferedReader 一次读取一行文件,直到到达所需的行。这显然是缓慢的。

  2. 找出多少个字节(而不是行)您的下一个阅读需要从文件中的那个位置开始直接找到使用诸如 RandomAccessFile 。是否有可能有效地知道正确的字节数取决于您的应用程序。例如,如果您正在按顺序逐个读取文件,只需记录您所在的位置即可。如果所有行都是固定长度的L字节,那么到达第N行就是寻求N * L的位置。如果这是一个经常重复的操作,那么一些预处理可能会有所帮助:例如,读取整个文件一次,并将每行的起始位置记录在内存中的HashMap中。下一次你需要到第N行,只需在HashMap中查找它的位置并直接找到那一点。


I have a text file with a rather large amount of data of about 2,000,000 lines. Going through the file with the following code snippet is easy but that's not what I need ;-)

def f = new File("input.txt")
f.eachLine() {
    // Some code here
}

I need to read only a specific range of lines from the file. Is there a way to specify the start and end line like this (pseudo-code)? I'd like to avoid loading all lines into memory with readLines() before selecting the range.

// Read all lines from 4 to 48
def f = new File("input.txt")
def start = 4
def end = 48
f.eachLine(start, end) {
    // Some code here
}

If this is not possible with Groovy any Java solution is welcome as well :-)

Cheers, Robert

解决方案

I don't believe there is any "magic" way to skip to an arbitrary "line" in a file. Lines are merely defined by newline characters, so without actually reading the file, there is no way to know where those will be. I believe you have two options:

  1. Follow Mark Peter's answer and use a BufferedReader to read the file in one line at a time until you reach your desired line. This will obviously be slow.
  2. Figure out how many bytes (rather than lines) your next read needs to start at and seek directly to that point in the file using something like RandomAccessFile. Whether or not it's possible to efficiently know the right number of bytes depends on your application. For example, if you are reading the file sequentially, one piece at a time, you simply record the position you left off at. If all the lines are of a fixed length L bytes, then getting to line N is just a matter of seeking to position N*L. If this is an operation you repeat often, some pre-processing might help: for example, read the entire file once and record the starting position of each line in an in-memory HashMap. Next time you need to go to line N, simply look up it's position in the HashMap and seek directly to that point.

这篇关于Groovy:从文件中读取一系列的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆