如何从apache下的一个包里找到所有可能的排列组合 [英] How To Find All Possible Permutations From A Bag under apache pig

查看:167
本文介绍了如何从apache下的一个包里找到所有可能的排列组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到所有可能的组合使用阿帕奇猪,我能够产生排列,但我想消除值的复制我写这段代码:

  A = LOAD'data'AS f1:chararray; 
DUMP A; ('B')
('C')
B = FOREACH A GENERATE $ 0 AS v1;
C = FOREACH A GENERATE $ 0 AS v2;
D = CROSS B,C;

我得到的结果如下:

  DUMP D; 
('A','A')
('A','B')
('A','C')
('B','A ')
('B','B')
('B','C')
('C','A')
('C','B' 'B')
('C','C')

m试图获得结果如下:

  DUMP R; 
('A','A')
('A','B')
('A','C')
('B','B ')
('B','C')
('C','C')

我该怎么做?我避免使用字符比较,因为可能在多行中出现多次出现的字符串

您可以过滤D删除你不想要的行。例如

  A = load'testdata.txt'; 
B = foreach生成$ 0;
C =交叉A,B;
D =过滤器C减$ 0< = $ 1;
dump D;

打印出来

 (C,C)
(B,C)
(B,B)
(A,C)
(A,B)
'(A,A)

当'testdata.txt'有

  A 
B
C


i'm trying to find all combinations possible using apache pig, i was able to generate permutation but i want to eliminate the replication of values i write this code :

A = LOAD 'data' AS f1:chararray;
DUMP A;
('A')
('B')
('C')
B = FOREACH A GENERATE $0 AS v1;
C = FOREACH A GENERATE $0 AS v2;
D = CROSS B, C;

And the result i obtained is like :

 DUMP D;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'B')
('B', 'C')
('C', 'A')
('C', 'B')
('C', 'C')

but what i'm trying to obtain the result is like bellow

DUMP R;
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')

how can i do this? i avoid to use comparison of characters because it's possible to have multiple occurrences of a string in more than a line

解决方案

You can FILTER D to remove the rows you don't want. For example

A = load 'testdata.txt';
B = foreach A generate $0;
C = Cross A, B;
D = filter C by $0 <= $1;
dump D;

which prints out

(C,C)
(B,C)
(B,B)
(A,C)
(A,B)
(A,A)

when 'testdata.txt' has

A
B
C

这篇关于如何从apache下的一个包里找到所有可能的排列组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆