在 PIG 的嵌套 FOREACH 中使用过滤器 [英] USING Filter in a Nested FOREACH in PIG
问题描述
我有两只猪亲戚.第一个 count_pairs
显示单词对以及它们被看到的次数.例如 ((car,tire), 4)
.第二个是 word_counts
,它跟踪每个单词被看到的次数.(汽车,20)
.我想找出每对被看到的次数与只看到第一个单词的次数的百分比.在我们的例子中,我想要 ((car,tire), 4/20)
.我试着写一个嵌套的 foreach 来解决这个问题:
I have two pig relations. The first one count_pairs
shows pairs of words and how many times they were seen. ex ((car,tire), 4)
. The second is word_counts
, which keeps track of how many times each word was seen ex. (car, 20)
. I would like to find the percentage of how many times each pair was seen compared to how many times just the first word was seen. In our case I would want ((car,tire), 4/20)
. I tried to write a nested foreach to solve this problem :
> percent_count_pairs = FOREACH count_pairs {
> denom = FILTER word_counts BY ($0 ==count_pairs.pair.word1);
> GENERATE pair, count2/(double)denom.$1;}
我不断收到此错误:
'Pig script failed to parse:
<file src/cluster.pig, line 27, column 15> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)'
这指向带有FILTER
的那一行;谷歌搜索这个错误并没有让我找到任何有用的东西.请帮忙!(ps.如果我从 foreach 中取出带有 FILTER
的行,这确实有效...)
This point to the line with the FILTER
;
googling this error did not lead me to anything helpful. Please help!
(ps. this does work if I take the line with FILTER
out of the foreach...)
推荐答案
经过更多的谷歌搜索后,我意识到这是 Pig 中的一个错误,不允许这样做:https://issues.apache.org/jira/browse/PIG-1798.我最终编写了自己的 UDF 进行过滤.
After more googling I came to realize that this is a bug in Pig that will not allow this: https://issues.apache.org/jira/browse/PIG-1798. I ended up writing my own UDF to filter.
这篇关于在 PIG 的嵌套 FOREACH 中使用过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!