尝试执行 Pig Latin 脚本时出现异常 [英] Getting exception while trying to execute a Pig Latin Script

查看:42
本文介绍了尝试执行 Pig Latin 脚本时出现异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在自己学习 Pig,在尝试探索数据集时遇到了异常.脚本中有什么问题以及原因:

I am learning Pig on my own and while trying to explore a dataset I am encountering an exception. What is wrong in the script and why:

movies_data = LOAD '/movies_data' using PigStorage(',') as (id:chararray,title:chararray,year:int,rating:double,duration:double);
high   = FILTER movies_data by rating > 4.0;
high_rated = FOREACH high GENERATE movies_data.title,movies_data.year,movies_data.rating,movies_data.duration;
DUMP high_rated;

在 MAP Reduce 执行结束时,我收到以下错误.

At the end of the MAP Reduce execution I am getting the below error.

2018-07-22 20:11:07,213 [main] ERROR org.apache.pig.tools.grunt.Grunt

ERROR 1066: Unable to open iterator for alias high_rated. 
Backend error : org.apache.pig.backend.executionengine.ExecException: 
ERROR 0: Scalar has more than one row in the output. 
1st : (1,The Nightmare Before Christmas,1993,3.9,4568.0), 
2nd :(2,The Mummy,1932,3.5,4388.0) 
(common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be "foo::bar" )

推荐答案

首先,让我们看看如何解决您的问题.您不需要使用别名访问您的字段.您的第三行可能很简单:

First, let's see how we can fix your problem. You don't need to access your fields using the alias name. Your third line could be simply:

high_rated = FOREACH high GENERATE title, year, rating, duration;

如果您出于某种原因想使用别名,则应使用参考运算符 (::),如 ERROR 建议中所示.然后你的行看起来像:

If you wanted to use the alias name for some reason you should use the referential operator (::) as can be seen in the ERROR suggestion. Then your line would look like:

high_rated = FOREACH high GENERATE movies_data::title, movies_data::year, movies_data::rating, movies_data::duration;

接下来,让我们尝试了解错误消息背后的确切原因.当您尝试使用点运算符 (.) 访问字段时, pig 将假定别名是标量(别名只有一行).由于您的别名不止一行,因此它会抱怨.您可以在此处阅读有关 Pig 中标量的更多信息:https://issues.apache.org/jira/browse/PIG-1434

Next, let's try to understand the exact reason behind the error message. When you try to access the fields using a dot operator (.), pig will assume that the alias is a scalar (alias having only one row). Since your alias had more than one row, it complained. You can read more about scalars in Pig here: https://issues.apache.org/jira/browse/PIG-1434

在 JIRA 的发行说明部分,您会在最后注意到,预期的错误消息与您得到的错误相匹配:

In the JIRA's release notes section, you will notice at the end, the expected error message matches the error you are getting:

If a relation contains more than single tuple, a runtime error is generated: 
"Scalar has more than one row in the output"

这篇关于尝试执行 Pig Latin 脚本时出现异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆