将数据从Apache Pig存储到SequenceFile [英] Storing data to SequenceFile from Apache Pig
问题描述
SequenceFileLoader
: <
<$ code> DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD'/ data / logs'使用SequenceFileLoader AS(...)
是否还有一个库允许写入Hadoop来自Pig的序列文件?
这只是一个实施StoreFunc的问题。
现在这是可能的,但是一旦Pig 0.7出来,它将变得更容易一些,因为它包括Load / Store接口的完全重新设计。
即将开源开源于 g ithub 包含用于基于Google协议缓冲区生成加载和存储funcs的代码(基于相同的输入/输出格式 - 显然,您已经拥有那些序列文件)。如果你需要如何做一些不太重要的东西的例子,请查看它。它应该相当简单。
Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader
:
REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD '/data/logs' USING SequenceFileLoader AS (...)
Is there also a library out there that would allow writing to Hadoop sequence files from Pig?
It's just a matter of implementing a StoreFunc to do so.
This is possible now, although it will become a fair bit easier once Pig 0.7 comes out, as it includes a complete redesign of the Load/Store interfaces.
The "Hadoop expansion pack" Twitter is about to open source open-sourced at github, includes code for generating Load and Store funcs based on Google Protocol Buffers (building on Input/Output formats for same -- you already have those for sequence files, obviously). Check it out if you need examples of how to do some of the less trivial stuff. It should be fairly straightforward though.
这篇关于将数据从Apache Pig存储到SequenceFile的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!