在 Pig 中删除单列 [英] Drop single column in Pig

查看:29
本文介绍了在 Pig 中删除单列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过大约 20 个 ID 的列表过滤表格.现在我的代码看起来像这样:

I'm filtering a table by a list of about 20 IDs. Right now my code looks like this:

A = LOAD 'ids.txt' USING PigStorage();
B = LOAD 'massive_table' USING PigStorage();
C = JOIN A BY $0, B BY $0;
D = FOREACH C GENERATE $1, $2, $3, $4, ...
STORE D INTO 'foo' USING PigStorage();

我不喜欢 D 行,我必须在其中重新生成一个新表以通过显式声明我想要呈现的每个其他列(有时是很多列)来摆脱连接列.我想知道是否有类似的东西:

What I don't like is line D, where I have to regenerate a new table to get rid of the joining column by explicitly declaring every single other column I want present (and sometimes that is a lot of columns). I'm wondering if there's something equivalent to:

FILTER B BY $0 IN (A)

或:

DROP $0 FROM C

推荐答案

也许和这个问题类似:

引用 JIRA 票证:https://issues.apache.org/jira/browse/PIG-1693 举例说明如何使用 .. 表示法来表示所有剩余的字段:

That references a JIRA ticket: https://issues.apache.org/jira/browse/PIG-1693 which examples how you can use the .. notation to denote all the remaining fields:

D = FOREACH C GENERATE $1 .. ;

这假设你有 0.9.0+ PIG

This assumes you have 0.9.0+ PIG

这篇关于在 Pig 中删除单列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆