PIG中的NOT IN子句 [英] NOT IN clause in PIG

查看:135
本文介绍了PIG中的NOT IN子句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想像

select * from A where A.ID NOT IN (select id from B) (in sql)

sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
c= FOREACH destnew GENERATE ID;
D=FILTER sourcenew BY NOT ID (c.ID);
 org.apache.pig.tools.pigscript.parser.ParseException: Encountered " <PATH> "D=FILTER "" at line 1, column 1.
Was expecting one of:
<EOF> 
"cat" ...
"clear" ...<EOF>

有关解决错误的任何帮助,请在执行最后一行时获取.

any help on this to resolve error, getting this on the execution of last line.

推荐答案

使用LEFT OUTER JOIN并过滤空值

Use LEFT OUTER JOIN and FILTER the nulls

sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:int,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
c = FOREACH destnew GENERATE ID;
d = JOIN sourcenew BY ID LEFT OUTER,destnew by ID;
e = FILTER d by destnew.ID is null;

注意 我写了一个包含几个测试文件的示例脚本,下面是可行的解决方案.在这种情况下,请检查是否从文件中正确加载了数据.

NOTE I wrote a sample script with couple of test files and below is the working solution.In you case check to see if you are loading the data correctly from your files.

test1.txt

1   abc
2   def
3   ghi
4   jkl
5   mno
6   pqr
7   stu
8   vwx
1   abc
2   def
3   ghi
4   jkl
1   abc
2   def
3   ghi
1   abc
2   def

test2.txt

1
2
3
4

脚本

A = LOAD 'test1.txt' USING PigStorage('\t') AS (aid:int,name:chararray);
B = LOAD 'test2.txt' USING PigStorage('\t') AS (bid:int);
C = JOIN A BY aid LEFT OUTER,B BY bid;
D = FILTER C BY bid is null;
DUMP D;

因此在上面的示例中,记录5、6、7、8应该出现在结果中,因为这些ID不在test2.txt中.

So in the above example records 5,6,7,8 should be in the result since those Ids are not in test2.txt.

这篇关于PIG中的NOT IN子句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆