PIG 中的 NOT IN 子句 [英] NOT IN clause in PIG
本文介绍了PIG 中的 NOT IN 子句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试
select * from A where A.ID NOT IN (select id from B) (in sql)
sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
c= FOREACH destnew GENERATE ID;
D=FILTER sourcenew BY NOT ID (c.ID);
org.apache.pig.tools.pigscript.parser.ParseException: Encountered " <PATH> "D=FILTER "" at line 1, column 1.
Was expecting one of:
<EOF>
"cat" ...
"clear" ...<EOF>
解决错误的任何帮助,在最后一行执行时得到这个.
any help on this to resolve error, getting this on the execution of last line.
推荐答案
使用 LEFT OUTER JOIN 并过滤空值
Use LEFT OUTER JOIN and FILTER the nulls
sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
c = FOREACH destnew GENERATE ID;
d = JOIN sourcenew BY ID LEFT OUTER,destnew by ID;
e = FILTER d by destnew.ID is null;
注意我编写了一个包含几个测试文件的示例脚本,下面是可行的解决方案.如果您是这种情况,请检查您是否从文件中正确加载了数据.
NOTE I wrote a sample script with couple of test files and below is the working solution.In you case check to see if you are loading the data correctly from your files.
test1.txt
1 abc
2 def
3 ghi
4 jkl
5 mno
6 pqr
7 stu
8 vwx
1 abc
2 def
3 ghi
4 jkl
1 abc
2 def
3 ghi
1 abc
2 def
test2.txt
1
2
3
4
脚本
A = LOAD 'test1.txt' USING PigStorage('\t') AS (aid:int,name:chararray);
B = LOAD 'test2.txt' USING PigStorage('\t') AS (bid:int);
C = JOIN A BY aid LEFT OUTER,B BY bid;
D = FILTER C BY bid is null;
DUMP D;
因此在上面的示例中,记录 5,6,7,8 应该在结果中,因为这些 ID 不在 test2.txt 中.
So in the above example records 5,6,7,8 should be in the result since those Ids are not in test2.txt.
这篇关于PIG 中的 NOT IN 子句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文