在Ruby中解析文件时跳过行的最快方法? [英] Fastest way to skip lines while parsing files in Ruby?

查看:113
本文介绍了在Ruby中解析文件时跳过行的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试搜索此内容,但找不到太多.似乎以前(很多次?)有人问过这个问题,所以我很抱歉.

I tried searching for this, but couldn't find much. It seems like something that's probably been asked before (many times?), so I apologize if that's the case.

我想知道在Ruby中解析文件某些部分的最快方法是什么.例如,假设我知道我想要的用于特定功能的信息在例如1000行文件的500至600行之间. (显然,这种问题是针对大文件的,为了示例,我只是使用较小的数字),因为我知道它不会在上半年出现,因此有一种快速的方法来忽略该信息?

I was wondering what the fastest way to parse certain parts of a file in Ruby would be. For example, suppose I know the information I want for a particular function is between lines 500 and 600 of, say, a 1000 line file. (obviously this kind of question is geared toward much large files, I'm just using those smaller numbers for the sake of example), since I know it won't be in the first half, is there a quick way of disregarding that information?

目前,我正在使用一些类似的东西:

Currently I'm using something along the lines of:

while  buffer = file_in.gets and file_in.lineno <600
  next unless file_in.lineno > 500
  if buffer.chomp!.include? some_string
    do_func_whatever
  end
end

它可以工作,但是我忍不住认为它可以更好地工作.

It works, but I just can't help but think it could work better.

我对Ruby还是很陌生,并且对学习使用它做事的新方式感兴趣.

I'm very new to Ruby and am interested in learning new ways of doing things in it.

推荐答案

file.lines.drop(500).take(100) # will get you lines 501-600

通常,您不能避免从头开始读取文件直到感兴趣的行,因为每一行的长度可以不同.但是,您可以避免的一件事是将整个文件加载到一个大数组中.只需逐行阅读,计数并丢弃它们,直到找到所需的内容.非常像您自己的示例.您可以使其更像Rubyish.

Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.

PS.锡曼的评论使我做了一些尝试.虽然我没有找到drop加载整个文件的任何原因,但确实存在一个问题:drop 返回数组中的其余文件.这是可以避免的一种方法:

PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would drop load whole file, there is indeed a problem: drop returns the rest of the file in an array. Here's a way this could be avoided:

file.lines.select.with_index{|l,i| (501..600) === i}

PS2:Doh,上面的代码虽然没有建立一个巨大的数组,但是却遍历了整个文件,甚至遍历了600行以下的内容.

PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:

enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100

或者,如果您喜欢FP:

or, if you prefer FP:

file.lines.tap{|enum| 500.times{enum.next}}.take(100)

无论如何,此独白的好处是您可以学习多种方法来迭代文件. ;)

Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)

这篇关于在Ruby中解析文件时跳过行的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆