哈托普猪袋减法 [英] hadoop pig bag subtraction

查看:191
本文介绍了哈托普猪袋减法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Pig解析我的应用程序日志,以了解哪些公开方法已被上个月未被调用的用户调用过(由同一用户)。



我设法在上个月之前和上个月之后获得由用户分组的方法:上个月的关系示例
前的


  u1 {(m1),(m2)} 
u2 {(m3),(m4)}


$ b

上个月关系样本

  u1 { (m1),(m3)} 
u2 {(m1),(m4)}

我想要的是由用户查找AFTER中哪些方法不在BEFORE中,即

NEWLY_ALLED预期结果

  u1 {(m3)} 
u2 {(m1)}

问题:我如何在Pig中做到这一点?是否有可能减去包包?



我已经尝试了DIFF功能,但它不会执行预期的减法。

问候,



乔尔

解决方案

我想你需要写一个UDF,那么你可以使用

  Set< T> setA ... 
Set< T> setB ...
Set< T> setAminusB = setA.subtract(setB);


I'm using Pig to parse my application logs to know which exposed methods have been called by a user that wasn't called the last month (by the same user).

I have managed to get methods called grouped by users before last month and after last month :

BEFORE last month relation sample

u1      {(m1),(m2)}
u2      {(m3),(m4)}

AFTER last month relation sample

u1      {(m1),(m3)}
u2      {(m1),(m4)}

What I want is to find, by users, which methods are in AFTER that are not in BEFORE, that is

NEWLY_CALLED expected result

u1      {(m3)}
u2      {(m1)}

Question : how can I do that in Pig ? is it possible to subtract bags ?

I have tried DIFF function but it does not perform the expected subtraction.

Regards,

Joel

解决方案

I think you need to write a UDF, then you can use

Set<T> setA ...
Set<T> setB ...
Set<T> setAminusB = setA.subtract(setB);

这篇关于哈托普猪袋减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆