如何将压缩包加载到猪 [英] how to load a tarball to pig

查看:21
本文介绍了如何将压缩包加载到猪的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 tarball (access.logs.tar.gz) 中的日志文件加载到我的 hadoop 集群中.我想知道他们的方法是直接将它加载到猪身上而不去解皮吗?

解决方案

PigStorage 将识别文件被压缩(通过 .gz 扩展名,这实际上是在 PigTextInputFormat 扩展的 TextInputFormat 中实现的),但之后你将处理 tar 文件.如果您能够处理 tar 中文件之间的标题行,那么您可以按原样使用 PigStorage,否则您需要编写自己的 PigTextInputFormat 扩展来处理删除每个文件之间的 tar 标题行

i have a log files that is in a tarball (access.logs.tar.gz) loaded into my hadoop cluster. I was wondering is their way to directly load it to pig with out untaring it?

解决方案

PigStorage will recognize the file is compressed (by the .gz extension, this is actually implemented in the TextInputFormat which PigTextInputFormat extends), but after that you'll be dealing with a tar file. If you're able to handle the header lines between the files in the tar then you can just use PigStorage as is, otherwise you'll need to write your own extension of PigTextInputFormat to handle stripping out the tar header lines between each file

这篇关于如何将压缩包加载到猪的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆