将数据从Apache Pig存储到SequenceFile [英] Storing data to SequenceFile from Apache Pig

查看:87
本文介绍了将数据从Apache Pig存储到SequenceFile的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用PiggyBank SequenceFileLoader



<



<$ code> DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();



log = LOAD'/ data / logs'使用SequenceFileLoader AS(...)



是否还有一个库允许写入Hadoop来自Pig的序列文件?

解决方案

这只是一个实施StoreFunc的问题。

现在这是可能的,但是一旦Pig 0.7出来,它将变得更容易一些,因为它包括Load / Store接口的完全重新设计。

Hadoop扩展包Twitter 即将开源开源于 g ithub 包含用于基于Google协议缓冲区生成加载和存储funcs的代码(基于相同的输入/输出格式 - 显然,您已经拥有那些序列文件)。如果你需要如何做一些不太重要的东西的例子,请查看它。它应该相当简单。


Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader:

REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

log = LOAD '/data/logs' USING SequenceFileLoader AS (...)

Is there also a library out there that would allow writing to Hadoop sequence files from Pig?

解决方案

It's just a matter of implementing a StoreFunc to do so.

This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.

The "Hadoop expansion pack" Twitter is about to open source open-sourced at github, includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same -- you already have those for sequence files, obviously). Check it out if you need examples of how to do some of the less trivial stuff. It should be fairly straightforward though.

这篇关于将数据从Apache Pig存储到SequenceFile的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆