如何加载一个tarball给猪 [英] how to load a tarball to pig

查看:113
本文介绍了如何加载一个tarball给猪的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个加载到我的hadoop集群中的压缩包(access.logs.tar.gz)中的日志文件。我想知道是他们的方式来直接将它加载到猪没有untare它? PigStorage会识别文件被压缩(通过.gz扩展名,这实际上是在PigTextInputFormat扩展的TextInputFormat中实现的) ,但之后你将处理一个tar文件。如果你能够处理tar文件之间的标题行,那么你可以直接使用PigStorage,否则你需要编写自己的PigTextInputFormat扩展来处理每个文件之间的tar标题行

p>

i have a log files that is in a tarball (access.logs.tar.gz) loaded into my hadoop cluster. I was wondering is their way to directly load it to pig with out untaring it?

解决方案

PigStorage will recognize the file is compressed (by the .gz extension, this is actually implemented in the TextInputFormat which PigTextInputFormat extends), but after that you'll be dealing with a tar file. If you're able to handle the header lines between the files in the tar then you can just use PigStorage as is, otherwise you'll need to write your own extension of PigTextInputFormat to handle stripping out the tar header lines between each file

这篇关于如何加载一个tarball给猪的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆