滤袋在Apache的PIG父值 [英] Filter bag by parent value in Apache PIG

查看:245
本文介绍了滤袋在Apache的PIG父值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Apache的PIG下列关系。

I have the following relation in Apache PIG.

TSERIES: {ORDERED: {(timestamp: long,contentHost: chararray)},ts1: long}

我要做到以下几点:

And I want to do the following:

F = foreach TSERIES {
    ts = filter ORDERED by timestamp > TSERIES.ts1;
    generate ts;
}

总之,我想保持比TS1更高timestmap有序包中的所有元素,
但猪会不会允许,特别是这部分 TS =滤镜通过时间戳&GT排序的。 TSERIES.ts1;

这可能吗?我使用的版本 0.9.2-cdh4.0.1 (Cloudera的)。

Is this possible? I'm using version 0.9.2-cdh4.0.1 (cloudera).

推荐答案

我不知道是否有办法做到这一点没有一个UDF ......好像应该有,但我无法弄清楚无论是。无论如何,你既可以写一个UDF直接做到这一点:经过包,过滤掉一些,并返回一个袋子。或者,你可以写一个UDF生成的UUID,然后压平袋并重新组了 - smoething是这样的:

I'm not sure if there's a way to do this without a UDF... it seems like there should be, but I can't figure it out either. Anyway, you could either write a UDF to do this directly: go through the bag, filter out some, and return a bag. Or, you could write a UDF to generate UUIDs and then flatten the bag and re-group it - smoething like this:

a = foreach TSERIES generate ORDERED, ts1, myudfs.GenerateUUID() as id;
b = foreach a generate FLATTEN(ORDERED) as ts, ts1, id;
c = filter b by ts.timestamp > ts1;
d = group c by id;

这篇关于滤袋在Apache的PIG父值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆