使用 Ruby 按行读取、编辑和写入文本文件 [英] Read, edit, and write a text file line-wise using Ruby

查看:134
本文介绍了使用 Ruby 按行读取、编辑和写入文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种在 Ruby 中读取、编辑和写入文件的好方法?

在我的在线搜索中,我发现一些建议将其全部读入一个数组,修改所述数组,然后将所有内容都写出来.我觉得应该有更好的解决方案,尤其是当我处理一个非常大的文件时.

类似于:

myfile = File.open("path/to/file.txt", "r+")myfile.each 做 |行|myfile.replace_puts('blah') if line =~/myregex/结尾myfile.close

其中 replace_puts 将覆盖当前行,而不是像当前那样(覆盖)写入下一行,因为指针位于行尾(在分隔符之后).

因此,匹配 /myregex/ 的每一行都将被替换为blah".显然,就处理而言,我想到的比这更复杂,并且将在一行中完成,但想法是相同的 - 我想逐行读取文件,并编辑某些行,并且完成后写出来.

也许有一种方法可以说倒回到最后一个分隔符之后"?或者某种使用 each_with_index 并通过行索引号写入的方法?不过,我找不到任何类似的东西.

到目前为止,我最好的解决方案是逐行读取内容,将它们逐行写入新的(临时)文件(可能已编辑),然后用新的临时文件覆盖旧文件并删除.同样,我觉得应该有更好的方法 - 我不认为我应该创建一个新的 1gig 文件来编辑现有 1GB 文件中的一些行.

解决方案

一般来说,无法在文件中间进行任意编辑.这不是 Ruby 的缺陷.这是文件系统的一个限制:大多数文件系统都可以轻松有效地在末尾增大或缩小文件,但在开头或中间则不然.因此,除非其大小保持不变,否则您将无法就地重写一行.

修改一堆行有两种通用模型.如果文件不是太大,只需将其全部读入内存,修改它,然后将其写回.例如,在文件的每一行开头添加Kilroy was here":

path = '/tmp/foo'lines = IO.readlines(path).map do |line|'Kilroy 在这里' + 行结尾File.open(path, 'w') do |file|file.puts 行结尾

虽然简单,但这种技术有一个危险:如果在写入文件时程序被中断,您将丢失部分或全部文件.它还需要使用内存来保存整个文件.如果您担心其中任何一个,那么您可能更喜欢下一种技术.

如您所见,您可以写入临时文件.完成后,重命名临时文件以替换输入文件:

需要'临时文件'需要'fileutils'路径 = '/tmp/foo'temp_file = Tempfile.new('foo')开始File.open(path, 'r') do |file|file.each_line 做 |line|temp_file.puts 'Kilroy 在这里' + 行结尾结尾temp_file.closeFileUtils.mv(temp_file.path, 路径)确保temp_file.closetemp_file.unlink结尾

由于重命名 (FileUtils.mv) 是原子的,重写的输入文件将立即弹出.如果程序被中断,要么文件被重写,要么不会.不可能部分重写.

ensure 子句不是绝对必要的:当 Tempfile 实例被垃圾收集时,文件将被删除.但是,这可能需要一段时间.ensure 块确保临时文件被立即清理,而不必等待它被垃圾收集.

Is there a good way to read, edit, and write files in place in Ruby?

In my online search I've found stuff suggesting to read it all into an array, modify said array, then write everything out. I feel like there should be a better solution, especially if I'm dealing with a very big file.

Something like:

myfile = File.open("path/to/file.txt", "r+")

myfile.each do |line|
    myfile.replace_puts('blah') if line =~ /myregex/
end

myfile.close

Where replace_puts would write over the current line, rather than (over)writing the next line as it currently does because the pointer is at the end of the line (after the separator).

So then every line that matches /myregex/ will be replaced with 'blah'. Obviously what I have in mind is a bit more involved than that, as far as processing, and would be done in one line, but the idea is the same - I want to read a file line by line, and edit certain lines, and write out when I'm done.

Maybe there's a way to just say "rewind back to just after the last separator"? Or some way of using each_with_index and write via a line index number? I couldn't find anything of the sort, though.

The best solution I have so far is to read things line-wise, write them out to a new (temp) file line-wise (possibly edited), then overwrite the old file with the new temp file and delete. Again, I feel like there should be a better way - I don't think I should have to create a new 1gig file just to edit some lines in an existing 1GB file.

解决方案

In general, there's no way to make arbitrary edits in the middle of a file. It's not a deficiency of Ruby. It's a limitation of the file system: Most file systems make it easy and efficient to grow or shrink the file at the end, but not at the beginning or in the middle. So you won't be able to rewrite a line in place unless its size stays the same.

There are two general models for modifying a bunch of lines. If the file is not too large, just read it all into memory, modify it, and write it back out. For example, adding "Kilroy was here" to the beginning of every line of a file:

path = '/tmp/foo'
lines = IO.readlines(path).map do |line|
  'Kilroy was here ' + line
end
File.open(path, 'w') do |file|
  file.puts lines
end

Although simple, this technique has a danger: If the program is interrupted while writing the file, you'll lose part or all of it. It also needs to use memory to hold the entire file. If either of these is a concern, then you may prefer the next technique.

You can, as you note, write to a temporary file. When done, rename the temporary file so that it replaces the input file:

require 'tempfile'
require 'fileutils'

path = '/tmp/foo'
temp_file = Tempfile.new('foo')
begin
  File.open(path, 'r') do |file|
    file.each_line do |line|
      temp_file.puts 'Kilroy was here ' + line
    end
  end
  temp_file.close
  FileUtils.mv(temp_file.path, path)
ensure
  temp_file.close
  temp_file.unlink
end

Since the rename (FileUtils.mv) is atomic, the rewritten input file will pop into existence all at once. If the program is interrupted, either the file will have been rewritten, or it will not. There's no possibility of it being partially rewritten.

The ensure clause is not strictly necessary: The file will be deleted when the Tempfile instance is garbage collected. However, that could take a while. The ensure block makes sure that the tempfile gets cleaned up right away, without having to wait for it to be garbage collected.

这篇关于使用 Ruby 按行读取、编辑和写入文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆