如何编辑大文件 [英] How to edit a big file

查看:219
本文介绍了如何编辑大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一个巨大的文件,该文件应由我的程序编辑.为了增加读取时间,我使用mmap(),然后仅读取正在查看的部分.但是,如果我想在文件中间添加一行,那是最好的方法呢?

Imagine a huge file that should be edited by my program. In order to increase read time I use mmap() and then only read out the parts I'm viewing. However if I want to add a line in the middle of the file, what's the best approach for that?

是添加一行然后移动文件其余部分的唯一方法吗?听起来很贵.

Is the only way to add a line and then move the rest of the file? That sounds expensive.

所以我的问题基本上是: 在大文件中间添加数据的最有效方法是什么?

So my question is basically: What's the most efficient way of adding data in the middle of a huge file?

推荐答案

在任何文件(大文件或小文件)的中间(在Linux或POSIX上)插入数据的唯一方法是将该文件复制到一个新文件中,然后稍后重命名(2)将该副本作为原始副本).因此,您将复制它的头部(直到插入点),将数据附加到该副本,然后复制尾部(在插入点之后).您可能还考虑调用 posix_fadvise(2)(甚至Linux特定的 readahead(2) ...),但这并不会减少复制所有数据的需要.例如,可以使用 mmap(2).替换 read(2),但是无论做什么都需要您复制所有数据.

The only way to insert data in the middle of any (huge or small) file (on Linux or POSIX) is to copy that file (into a fresh one, then later rename(2) the copy as the original). So you'll copy its head (up to insertion point), you'll append the data to that copy, and then you copy the tail (after insertion point). You might consider also calling posix_fadvise(2) (or even the Linux specific readahead(2)...) but that does not aleviate the need to copy all the data. mmap(2) might be used e.g. to replace read(2) but whatever you do requires you to copy all the data.

当然,如果碰巧您用另一个相同大小的块(因此没有实际插入)替换文件中间的数据块,则可以使用普通的 lseek(2) +

Of course, if it happens that you are replacing a data chunk in the middle of the file by another chunk of the same size (so no real insertion), you can use plain lseek(2) + write(2)

是添加一行然后移动文件其余部分的唯一方法吗?听起来很贵.

Is the only way to add a line and then move the rest of the file? That sounds expensive.

是的,从概念上讲,这是唯一的方法.

Yes it is conceptually the only way.

您应该考虑使用其他纯文本文件:查看 SQLite 此答案.两者都为您提供了比POSIX文件更高的抽象性,因此使您能够插入"数据(当然,它们仍在内部基于和使用POSIX文件).

You should consider using something else that a plain textual file: look into SQLite or GDBM (they might be very efficient in your use case). See also this answer. Both provides you with some higher abstraction than POSIX files, so give you the ability to "insert" data (Of course they are still internally based upon and using POSIX files).

这篇关于如何编辑大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆