如何在 apache pig 下从一个包中找到所有可能的排列 [英] How To Find All Possible Permutations From A Bag under apache pig
本文介绍了如何在 apache pig 下从一个包中找到所有可能的排列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用 apache pig 找到所有可能的组合,我能够生成排列,但我想消除我编写此代码的值的复制:
i'm trying to find all combinations possible using apache pig, i was able to generate permutation but i want to eliminate the replication of values i write this code :
A = LOAD 'data' AS f1:chararray;
DUMP A;
('A')
('B')
('C')
B = FOREACH A GENERATE $0 AS v1;
C = FOREACH A GENERATE $0 AS v2;
D = CROSS B, C;
我得到的结果是:
DUMP D;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'B')
('B', 'C')
('C', 'A')
('C', 'B')
('C', 'C')
但我试图获得的结果就像波纹管
but what i'm trying to obtain the result is like bellow
DUMP R;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')
我该怎么做?我避免使用字符比较,因为一个字符串可能在多于一行中出现多次
how can i do this? i avoid to use comparison of characters because it's possible to have multiple occurrences of a string in more than a line
推荐答案
您可以通过 FILTER D 删除不需要的行.例如
You can FILTER D to remove the rows you don't want. For example
A = load 'testdata.txt';
B = foreach A generate $0;
C = Cross A, B;
D = filter C by $0 <= $1;
dump D;
打印出来
(C,C)
(B,C)
(B,B)
(A,C)
(A,B)
(A,A)
当'testdata.txt'有
when 'testdata.txt' has
A
B
C
这篇关于如何在 apache pig 下从一个包中找到所有可能的排列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文