将数据从 Apache Pig 存储到 SequenceFile [英] Storing data to SequenceFile from Apache Pig

查看:26
本文介绍了将数据从 Apache Pig 存储到 SequenceFile的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Apache Pig 可以使用 PiggyBank SequenceFileLoader 从 Hadoop 序列文件加载数据:

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader:

注册/home/hadoop/pig/contrib/piggybank/java/piggybank.jar;

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

log = LOAD '/data/logs' USING SequenceFileLoader AS (...)

是否还有允许从 Pig 写入 Hadoop 序列文件的库?

Is there also a library out there that would allow writing to Hadoop sequence files from Pig?

推荐答案

这只是实现 StoreFunc 的问题.

It's just a matter of implementing a StoreFunc to do so.

这现在是可能的,尽管一旦 Pig 0.7 发布它会变得容易一些,因为它包括对加载/存储界面的完全重新设计.

This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.

Hadoop 扩展包" Twitter 即将开源github,包括用于生成基于 Google Protocol Buffers 的加载和存储函数的代码(建立在输入/输出格式上——显然你已经有了用于序列文件的那些).如果您需要有关如何做一些不那么琐碎的事情的示例,请查看它.不过应该相当简单.

The "Hadoop expansion pack" Twitter is about to open source open-sourced at github, includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same -- you already have those for sequence files, obviously). Check it out if you need examples of how to do some of the less trivial stuff. It should be fairly straightforward though.

这篇关于将数据从 Apache Pig 存储到 SequenceFile的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆