猪-如何在JOIN之后引用FOREACH中的列? [英] pig - how to reference columns in a FOREACH after a JOIN?

查看:54
本文介绍了猪-如何在JOIN之后引用FOREACH中的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

A = load 'a.txt' as (id, a1);
B = load 'b.txt as (id, b1);
C = join A by id, B by id;
D = foreach C generate id,a1,b1;
dump D;

第四行失败: Invalid field projection. Projected field [id] does not exist in schema

我尝试更改为A.id,但最后一行失败:ERROR 0: Scalar has more than one row in the output.

I tried to change to A.id but then the last line fails on: ERROR 0: Scalar has more than one row in the output.

推荐答案

您正在寻找的是.您想要的是A::id,而不是A.id.

What you are looking for is the "Disambiguate Operator". What you want is A::id, not A.id.

A.id说:有一个关系/包 A,并且在其架构中有一个名为id的列"

A.id says "there is a relation/bag A and there is a column called id in its schema"

A::id说:有一个来自A记录,并且有一个名为id的列"

A::id says "there is a record from A and that has a column called id"

因此,您可以这样做:

A = load 'a.txt' as (id, a1);
B = load 'b.txt as (id, b1);
C = join A by id, B by id;
D = foreach C generate A::id,a1,b1;
dump D;


一个肮脏的选择:


A dirty alternative:

仅仅因为我很懒,当您开始一个接一个地进行多个连接时,歧义消除变得很奇怪:使用唯一的标识符.

Just because I'm lazy, and disambiguation gets really weird when you start doing multiple joins one after another: use unique identifiers.

A = load 'a.txt' as (ida, a1);
B = load 'b.txt as (idb, b1);
C = join A by ida, B by idb;
D = foreach C generate ida,a1,b1;
dump D;

这篇关于猪-如何在JOIN之后引用FOREACH中的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆