拆分gzip日志文件而不将未解压缩的拆分存储在磁盘上 [英] Splitting gzipped logfiles without storing the ungzipped splits on disk
问题描述
我经常执行的任务是将一组大的(大约1-2 GiB)压缩的Apache日志文件分成几部分(比如说50万行的块).应该再次压缩最终文件,以限制磁盘使用.
I have a recurring task of splitting a set of large (about 1-2 GiB each) gzipped Apache logfiles into several parts (say chunks of 500K lines). The final files should be gzipped again to limit the disk usage.
在Linux上,我通常会这样做:
On Linux I would normally do:
zcat biglogfile.gz | split -l500000
生成的文件文件将被命名为xaa,xab,xac等 因此,我这样做:
The resulting files files will be named xaa, xab, xac, etc So I do:
gzip x*
此方法的效果是作为中间结果,这些巨大文件被临时存储在磁盘上.有没有办法避免这种中间磁盘使用情况?
The effect of this method is that as an intermediate result these huge files are temporarily stored on disk. Is there a way to avoid this intermediate disk usage?
我可以(类似于xargs的方式)通过命令(例如gzip)分割输出的管道并即时重新压缩输出吗? 还是我看错了方向,还有更好的方法吗?
Can I (in a way similar to what xargs does) have split pipe the output through a command (like gzip) and recompress the output on the fly? Or am I looking in the wrong direction and is there a much better way to do this?
谢谢.
推荐答案
您可以按照不知道何时引入了--filter
选项,但根据注释,它在core utils 8.4
中不起作用.
not aware when --filter
option was introduced but according to comments, it is not working in core utils 8.4
.
这篇关于拆分gzip日志文件而不将未解压缩的拆分存储在磁盘上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!