哈托普猪袋减法 [英] hadoop pig bag subtraction
问题描述
我使用Pig解析我的应用程序日志,以了解哪些公开方法已被上个月未被调用的用户调用过(由同一用户)。
我设法在上个月之前和上个月之后获得由用户分组的方法:上个月的关系示例
前的
u1 {(m1),(m2)}
u2 {(m3),(m4)}
$ b上个月关系样本
u1 { (m1),(m3)}
u2 {(m1),(m4)}
我想要的是由用户查找AFTER中哪些方法不在BEFORE中,即
NEWLY_ALLED预期结果
u1 {(m3)}
u2 {(m1)}
问题:我如何在Pig中做到这一点?是否有可能减去包包?
我已经尝试了DIFF功能,但它不会执行预期的减法。
问候,
乔尔
解决方案我想你需要写一个UDF,那么你可以使用
Set< T> setA ...
Set< T> setB ...
Set< T> setAminusB = setA.subtract(setB);
I'm using Pig to parse my application logs to know which exposed methods have been called by a user that wasn't called the last month (by the same user).
I have managed to get methods called grouped by users before last month and after last month :
BEFORE last month relation sample
u1 {(m1),(m2)} u2 {(m3),(m4)}
AFTER last month relation sample
u1 {(m1),(m3)} u2 {(m1),(m4)}
What I want is to find, by users, which methods are in AFTER that are not in BEFORE, that is
NEWLY_CALLED expected result
u1 {(m3)} u2 {(m1)}
Question : how can I do that in Pig ? is it possible to subtract bags ?
I have tried DIFF function but it does not perform the expected subtraction.
Regards,
Joel
解决方案I think you need to write a UDF, then you can use
Set<T> setA ... Set<T> setB ... Set<T> setAminusB = setA.subtract(setB);
这篇关于哈托普猪袋减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!