无法将猪元组传递给 python UDF [英] Unable to pass pig tuple to python UDF

查看:27
本文介绍了无法将猪元组传递给 python UDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 10K 条记录的 master.txt,所以它的每一行都是一个元组 &整个相同的需要传递给python UDF.由于它有多个记录,因此在存储 p2preportmap 时会出现以下错误.请帮忙

I have master.txt which has 10K records, so each line of it will be a tuple & whole of the same needs to be passed to python UDF. Since it has multiple records, so on storing p2preportmap getting following error. Please help

错误如下:

无法打开别名 p2preportmap 的迭代器.后端错误:org.apache.pig.backend.executionengine.ExecException: 错误 0: 标量输出中有不止一行.第一:(010301,MTS,MM),第二:(010B06,MTS,TN)(常见原因:JOIN"然后FOREACH ... GENERATEfoo.bar" 应该是 "foo::bar" )

Unable to open iterator for alias p2preportmap. Backend error : org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (010301,MTS,MM), 2nd :(010B06,MTS,TN) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be "foo::bar" )

猪脚本如下:

REGISTER 'smsiuc_udf.py' using streaming_python as smsiuc_udfs;
cdrs = load '2016040111*' USING PigStorage('|','-tagFile') ;

mastergtrec = load 'master.txt' USING PigStorage(',','-tagFile');

mastergt = FOREACH mastergtrec GENERATE (chararray) UPPER($1) as opcdpc, (chararray) UPPER($2) as gtoptname,(chararray) UPPER($3) as gtoptcircle;

mastergttup = FOREACH mastergt generate TOTUPLE(opcdpc,gtoptname,gtoptcircle) as mstgttup;

cdrrecord = FOREACH cdrs GENERATE (chararray) UPPER($1) as aparty, (chararray) UPPER($2) as bparty,$3 as smssentdate,$4 as smssenttime,($29=='6' ? 'S' : 'F') as status,(chararray) UPPER($26) as srcgt,(chararray) UPPER($27) as destgt,($12=='405899136999995' ? 'MTSDEL-CDMA' : ($12=='919875089998' ? 'MTSRAJ-GSM' : ($12=='405899150999995' ? 'MTSCHN-CDMA' : $12) ) ) as smscgt, (chararray)$0 as cdrfname,(chararray) $13 as prepost;

filteredp2pcdrs = FILTER cdrrecord by smsiuc_udfs.pullp2pcdrs(aparty,bparty,srcgt,destgt) and status == 'S' and SUBSTRING(smssentdate,4,6) == '$MON';

groupp2pcdrs = GROUP filteredp2pcdrs by (srcgt,destgt,aparty,bparty,smscgt,status,prepost);

distinctp2pcdrs= FOREACH groupp2pcdrs {
uniq = DISTINCT filteredp2pcdrs.(srcgt,destgt,aparty,bparty,smscgt,status,prepost);
GENERATE FLATTEN(group),COUNT(uniq) as cnt;
};

p2preportmap = FOREACH distinctp2pcdrs GENERATE smsiuc_udfs.p2preport(srcgt,destgt,aparty,bparty,mastergttup ),smscgt,status,prepost,cnt

推荐答案

这可以通过添加一个虚拟列然后分组来完成.

This can be done by adding a dummy column and then grouping.

dummy= foreach p2preportmap generate 1, $0,$1 ....

dummmy= foreach p2preportmap generate 1, $0,$1 ....

grouped = 按 $0 分组虚拟

grouped = group dummy by $0

这篇关于无法将猪元组传递给 python UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆