如何加载一个tarball给猪 [英] how to load a tarball to pig
问题描述
我有一个加载到我的hadoop集群中的压缩包(access.logs.tar.gz)中的日志文件。我想知道是他们的方式来直接将它加载到猪没有untare它? PigStorage会识别文件被压缩(通过.gz扩展名,这实际上是在PigTextInputFormat扩展的TextInputFormat中实现的) ,但之后你将处理一个tar文件。如果你能够处理tar文件之间的标题行,那么你可以直接使用PigStorage,否则你需要编写自己的PigTextInputFormat扩展来处理每个文件之间的tar标题行
p>i have a log files that is in a tarball (access.logs.tar.gz) loaded into my hadoop cluster. I was wondering is their way to directly load it to pig with out untaring it?
PigStorage will recognize the file is compressed (by the .gz extension, this is actually implemented in the TextInputFormat which PigTextInputFormat extends), but after that you'll be dealing with a tar file. If you're able to handle the header lines between the files in the tar then you can just use PigStorage as is, otherwise you'll need to write your own extension of PigTextInputFormat to handle stripping out the tar header lines between each file
这篇关于如何加载一个tarball给猪的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!