拆分gzip日志文件而不将未解压缩的拆分存储在磁盘上 [英] Splitting gzipped logfiles without storing the ungzipped splits on disk

查看:160
本文介绍了拆分gzip日志文件而不将未解压缩的拆分存储在磁盘上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常执行的任务是将一组大的(大约1-2 GiB)压缩的Apache日志文件分成几部分(比如说50万行的块).应该再次压缩最终文件,以限制磁盘使用.

I have a recurring task of splitting a set of large (about 1-2 GiB each) gzipped Apache logfiles into several parts (say chunks of 500K lines). The final files should be gzipped again to limit the disk usage.

在Linux上,我通常会这样做:

On Linux I would normally do:

zcat biglogfile.gz | split -l500000

生成的文件文件将被命名为xaa,xab,xac等 因此,我这样做:

The resulting files files will be named xaa, xab, xac, etc So I do:

gzip x*

此方法的效果是作为中间结果,这些巨大文件被临时存储在磁盘上.有没有办法避免这种中间磁盘使用情况?

The effect of this method is that as an intermediate result these huge files are temporarily stored on disk. Is there a way to avoid this intermediate disk usage?

我可以(类似于xargs的方式)通过命令(例如gzip)分割输出的管道并即时重新压缩输出吗? 还是我看错了方向,还有更好的方法吗?

Can I (in a way similar to what xargs does) have split pipe the output through a command (like gzip) and recompress the output on the fly? Or am I looking in the wrong direction and is there a much better way to do this?

谢谢.

推荐答案

您可以按照不知道何时引入了--filter选项,但根据注释,它在core utils 8.4中不起作用.

not aware when --filter option was introduced but according to comments, it is not working in core utils 8.4.

这篇关于拆分gzip日志文件而不将未解压缩的拆分存储在磁盘上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆