如何将大tar.gz文件的内容通过管道传输到STDOUT? [英] how to pipe contents of large tar.gz file to STDOUT?
问题描述
我有一个large.tar.gz
文件,其中包含大约一百万个文件,其中大约1/4是html文件,我想解析其中每个html文件的几行.
I have a large.tar.gz
file containing about 1 million files, out of which about 1/4 of them are html files, and I want to parse a few lines of each of the html files within.
我想避免不得不将大型large.tar.gz
的内容提取到文件夹中,然后解析html文件,相反,我想知道如何将large.tar.gz
中html文件的内容直接传递到STDOUT
以便我可以从他们那里grep/解析出我想要的信息?
I want to avoid having to extract the contents of large large.tar.gz
into a folder and then parse the html files, instead I would like to know how can I pipe the contents of the html files in the large.tar.gz
straight to STDOUT
so that I can grep/parse out the information I want from them?
我认为一定有一些魔术,例如:
I presume there must be some magic like:
tar -special_flags large.tar.gz | grep_only_files_with_extension html | xargs -n1 head -n 99999 | ./parse_contents.pl -
有什么想法吗?
推荐答案
与GNU tar一起使用,以将tgz提取到stdout:
Use this with GNU tar to extract a tgz to stdout:
tar -xOzf large.tar.gz --wildcards '*.html' | grep ...
-O, --to-stdout
:将文件提取到标准输出
-O, --to-stdout
: extract files to standard output
这篇关于如何将大tar.gz文件的内容通过管道传输到STDOUT?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!