猪 - 如何在加入后引用 FOREACH 中的列? [英] pig - how to reference columns in a FOREACH after a JOIN?
问题描述
A = load 'a.txt' as (id, a1);
B = load 'b.txt as (id, b1);
C = join A by id, B by id;
D = foreach C generate id,a1,b1;
dump D;
第 4 行失败:无效的字段投影.架构中不存在投影字段 [id]
我尝试更改为 A.id 但最后一行失败:ERROR 0: Scalar has more than one row in the output.
I tried to change to A.id but then the last line fails on: ERROR 0: Scalar has more than one row in the output.
推荐答案
您正在寻找的是 "消除运算符的歧义".你想要的是A::id
,而不是A.id
.
What you are looking for is the "Disambiguate Operator". What you want is A::id
, not A.id
.
A.id
说有一个 relation/bag A
并且有一列叫做 id
在其架构中"
A.id
says "there is a relation/bag A
and there is a column called id
in its schema"
A::id
表示有来自 A
的 记录,并且有一列名为 id
"
A::id
says "there is a record from A
and that has a column called id
"
所以,你会这样做:
A = load 'a.txt' as (id, a1);
B = load 'b.txt as (id, b1);
C = join A by id, B by id;
D = foreach C generate A::id,a1,b1;
dump D;
<小时>
一个肮脏的选择:
A dirty alternative:
只是因为我很懒,当你开始一个接一个地进行多个连接时,消歧变得非常奇怪:使用唯一标识符.
Just because I'm lazy, and disambiguation gets really weird when you start doing multiple joins one after another: use unique identifiers.
A = load 'a.txt' as (ida, a1);
B = load 'b.txt as (idb, b1);
C = join A by ida, B by idb;
D = foreach C generate ida,a1,b1;
dump D;
这篇关于猪 - 如何在加入后引用 FOREACH 中的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!