对压缩存档中的文本文件运行"head",而无需解压缩存档 [英] Run `head` on a text file inside a zipped archive without unpacking the archive

查看:124
本文介绍了对压缩存档中的文本文件运行"head",而无需解压缩存档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问候,

我已经从先前的团队接手,并编写了处理csv文件的ETL作业.我在ubuntu上结合使用了shell脚本和perl. csv文件很大;他们以压缩档案的形式到达.解压缩后,许多都超过30Gb-是的,这是G

I've taken over from a prior team and writing ETL jobs which process csv files. I use a combination of shell scripts and perl on ubuntu. The csv files are huge; they arrive as zipped archives. Unzipped, many are more than 30Gb - yes, that's a G

旧版进程是在cron上运行的批处理作业,它完全解压缩每个文件,读取文件的第一行并将其复制到配置文件中,然后重新压缩整个文件.有时候这会花费许多小时的处理时间,毫无益处.

Legacy process is a batch job running on cron that unzips each file entirely, reads and copies the first line of it into a config file, then re-zips the entire file. Some days this takes many many hours of processing time, for no benefit.

您能建议一种方法,仅从压缩存档中的每个文件中提取第一行(或前几行),而无需完全解压缩存档吗?

Can you suggest a method to only extract the first line (or first few lines) from each file inside a zipped archive, without fully unpacking the archives?

推荐答案

unzip 命令行实用程序具有-p选项,该选项将文件转储至标准输出.只需将其通过管道传输到 head ,它就不会麻烦将整个文件提取到磁盘上

The unzip command line utility has a -p option which dumps a file to standard out. Just pipe that into head and it'll not bother extracting the whole file to disk.

或者,来自 perldoc IO::Compress::Zip :

Alternatively, from perldoc IO::Compress::Zip:

my ($status, $bufferRef);
my $member = $zip->memberNamed( 'xyz.txt' );
$member->desiredCompressionMethod( COMPRESSION_STORED );
$status = $member->rewindData();
die "error $status" unless $status == AZ_OK;
while ( ! $member->readIsDone() )
{
   ( $bufferRef, $status ) = $member->readChunk();
   die "error $status" if $status != AZ_OK && $status != AZ_STREAM_END;
   # do something with $bufferRef:
   print $$bufferRef;
}
$member->endRead();

进行修改以适应需要,即遍历文件列表$zip->memberNames(),并且仅读取前几行.

Modify to suit, i.e. by iterating over the file list $zip->memberNames(), and only reading the first few lines.

这篇关于对压缩存档中的文本文件运行"head",而无需解压缩存档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆