以固定时间或非常快的速度添加到非常大的文件 [英] Prepend to Very Large File in Fixed Time or Very Fast

查看:87
本文介绍了以固定时间或非常快的速度添加到非常大的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的文件(> 500GB),我想以一个较小的标头(< 20KB)作为前缀.执行以下命令:

I have a file that is very large (>500GB) that I want to prepend with a relatively small header (<20KB). Doing commands such as:

cat header bigfile > tmp
mv tmp bigfile

或类似的命令(例如sed)非常慢.

or similar commands (e.g., with sed) are very slow.

将标头写入现有大文件开头的最快方法是什么?我正在寻找可以在CentOS 7.2下运行的解决方案.可以从CentOS安装软件包或更新仓库,EPEL或RPMForge.

What is the fastest method of writing a header to the beginning of an existing large file? I am looking for a solution that can run under CentOS 7.2. It is okay to install packages from CentOS install or updates repo, EPEL, or RPMForge.

如果存在一些不涉及在大文件中重新定位或复制大量数据的方法,那将是很好的选择.也就是说,我希望找到一个解决方案,该解决方案可以在给定的头文件中固定时间运行,而不管bigfile的大小如何.如果要求太多,那我只是在寻求最快的方法.

It would be great if some method exists that doesn't involve relocating or copying the large amount of data in the bigfile. That is, I'm hoping for a solution that can operate in fixed time for a given header file regardless of the size of the bigfile. If that is too much to ask for, then I'm just asking for the fastest method.

编译辅助工具(如C/C ++)或使用脚本语言是完全可以接受的.

Compiling a helper tool (as in C/C++) or using a scripting language is perfectly acceptable.

推荐答案

这是否需要做一次,以修复"设计监督?还是您需要定期执行的操作,例如将摘要数据(例如,数据记录的数量)添加到文件的开头?

Is this something that needs to be done once, to "fix" a design oversight perhaps? Or is it something that you need to do on a regular basis, for instance to add summary data (for instance, the number of data records) to the beginning of the file?

如果您只需要这样做一次,那么最好的选择就是接受已犯的错误并承担追溯修复的后果.只要您使目标驱动器与源驱动器不同,您就应该能够在大约两个小时内修复500GB的文件.因此,经过几个小时的批处理过程后,您可能已经升级了三十或四十个文件

If you need to do it just once then your best option is just to accept that a mistake has been made and take the consequences of the retro-fix. As long as you make your destination drive different from the source drive you should be able to fix up a 500GB file within about two hours. So after a week of batch processes running after hours you could have upgraded perhaps thirty or forty files

如果这是所有此类文件的标准要求,并且您认为只能在文件完成后才应用更改-可能是某种摘要信息-那么您应该在每个文件的开头保留空间并留空.然后,只需查找标头区域并在可以提供真实数据后将其覆盖即可,这很简单

If this is a standard requirement for all such files, and you think you can apply the change only when the file is complete -- some sort of summary information perhaps -- then you should reserve the space at the beginning of each file and leave it empty. Then it is a simple matter of seeking into the header region and overwriting it with the real data once it can be supplied

如前所述,标准文件系统需要复制整个文件才能在开头添加内容

As has been explained, standard file systems require the whole of a file to be copied in order to add something at the beginning

如果您的500GB文件位于标准硬盘上,这将允许每秒以大约100MB的速度读取数据,那么读取整个文件将需要5120秒,或大约1小时30分钟

If your 500GB file is on a standard hard disk, which will allow data to be read at around 100MB per second, then reading the whole file will take 5,120 seconds, or roughly 1 hour 30 minutes

只要您将目的地安排为与来源分开的驱动器,您通常就可以在读取的同时写入新文件,因此它所花的时间不会比这更长.但是,恐怕除此之外,没有其他方法可以加快速度

As long as you arrange for the destination to be a separate drive from the source, your can mostly write the new file in parallel with the read, so it shouldn't take much longer than that. But there's no way to speed it up other than that, I'm afraid

这篇关于以固定时间或非常快的速度添加到非常大的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆