使用 Hadoop Pig 生成多个输出 [英] Generate multiple outputs with Hadoop Pig
问题描述
我有一个包含 Hadoop 数据列表的文件.我已经构建了一个简单的 Pig 脚本,它通过 id 号 分析文件,依此类推...
我要寻找的最后一步是:我想为每个唯一的id 编号创建(存储)一个文件.所以这应该取决于小组步骤……但是,我不明白这是否可行(也许有一个自定义商店模块?).
有什么想法吗?
谢谢
丹尼尔
在牢记脆弱的同时,MultiStorage,在 PiggyBank 中,似乎正是您要找的.p>
I've got this file containing a list of data in Hadoop. I've build a simple Pig script which analyze the file by the id number, and so on...
The last step I'm looking for is this: I'd like to to create (store) a file for each unique id number. So this should depend on a group step...however, I haven't understood if this is possible (maybe there is a custom store module?).
Any idea?
Thanks
Daniele
While keeping in mind what is said by frail, MultiStorage, in PiggyBank, seems to be what you are looking for.
这篇关于使用 Hadoop Pig 生成多个输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!