PIG 中的 NOT IN 子句 [英] NOT IN clause in PIG

查看:22
本文介绍了PIG 中的 NOT IN 子句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试

select * from A where A.ID NOT IN (select id from B) (in sql)

sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
c= FOREACH destnew GENERATE ID;
D=FILTER sourcenew BY NOT ID (c.ID);
 org.apache.pig.tools.pigscript.parser.ParseException: Encountered " <PATH> "D=FILTER "" at line 1, column 1.
Was expecting one of:
<EOF> 
"cat" ...
"clear" ...<EOF>

解决错误的任何帮助,在最后一行执行时得到这个.

any help on this to resolve error, getting this on the execution of last line.

推荐答案

使用 LEFT OUTER JOIN 并过滤空值

Use LEFT OUTER JOIN and FILTER the nulls

sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
c = FOREACH destnew GENERATE ID;
d = JOIN sourcenew BY ID LEFT OUTER,destnew by ID;
e = FILTER d by destnew.ID is null;

注意我编写了一个包含几个测试文件的示例脚本,下面是可行的解决方案.如果您是这种情况,请检查您是否从文件中正确加载了数据.

NOTE I wrote a sample script with couple of test files and below is the working solution.In you case check to see if you are loading the data correctly from your files.

test1.txt

1   abc
2   def
3   ghi
4   jkl
5   mno
6   pqr
7   stu
8   vwx
1   abc
2   def
3   ghi
4   jkl
1   abc
2   def
3   ghi
1   abc
2   def

test2.txt

1
2
3
4

脚本

A = LOAD 'test1.txt' USING PigStorage('\t') AS (aid:int,name:chararray);
B = LOAD 'test2.txt' USING PigStorage('\t') AS (bid:int);
C = JOIN A BY aid LEFT OUTER,B BY bid;
D = FILTER C BY bid is null;
DUMP D;

因此在上面的示例中,记录 5,6,7,8 应该在结果中,因为这些 ID 不在 test2.txt 中.

So in the above example records 5,6,7,8 should be in the result since those Ids are not in test2.txt.

这篇关于PIG 中的 NOT IN 子句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆