按不同字段连接多个关系 [英] Join Multiple Relations by Different Fields

查看:19
本文介绍了按不同字段连接多个关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有三个文件 data1data2assocs:

Say I have three files data1, data2 and assocs:

$ cat data1
key1,foo
key2,bar
$ cat data2
key3,braz
key4,froz
$ cat assoc 
key1,key3
key2,key4

我通过

$ pig -b -p debug=WARN -x local
Warning: $HADOOP_HOME is deprecated.

Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
Logging error messages to: /home/vince/tmp/pig_1355407390166.log
Connecting to hadoop file system at: file:///
grunt> data1 = load 'data1' using PigStorage(',') as (key: chararray, val: chararray);  
grunt> data2 = load 'data2' using PigStorage(',') as (key: chararray, val: chararray);  
grunt> assoc = load 'assoc' using PigStorage(',') as (key1: chararray, key2: chararray);

我想要的是一个看起来像这样的关系:

What I want is a relation that looks like:

(foo, braz)
(bar, froz)

那就是

data1_val, data1_key <-> assoc_key1, assoc_key2 <-> data2_key, data2_val

推荐答案

A = join data1 by key, assoc by key1;
B = join A by assoc::key2, data2 by key;
RES = foreach B generate A::data1::val, data2::val;

这篇关于按不同字段连接多个关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆