Apache Pig 中的连接错误 [英] Connection Error in Apache Pig

查看:19
本文介绍了Apache Pig 中的连接错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Hadoop 2.0.5 运行 Apache Pig .11.1.

I am running Apache Pig .11.1 with Hadoop 2.0.5.

我在 Pig 中运行的大多数简单作业都运行良好.

Most simple jobs that I run in Pig work perfectly fine.

但是,每当我尝试在大型数据集上使用 GROUP BY 或 LIMIT 运算符时,都会出现以下连接错误:

However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors:

2013-07-29 13:24:08,591 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 
013-07-29 11:57:29,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:30,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:31,422 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException

奇怪的是,在这些错误持续出现大约 2 分钟后,它们会停止,并且正确的输出显示在底部.

The strange thing is that after these errors keeping appearing for about 2 minutes, they'll stop, and the correct output shows up at the bottom.

因此 Hadoop 运行良好并计算出正确的输出.问题只是这些不断出现的连接错误.

So Hadoop is running fine and computing the proper output. The problem is just these connection errors that keep popping up.

LIMIT 操作符总是得到这个错误.它发生在 MapReduce 模式和本地模式.GROUP BY 运算符适用于小型数据集.

The LIMIT operator always gets this error. It happens on both MapReduce mode and local mode. The GROUP BY operator will work fine on small datasets.

我注意到的一件事是,每当出现此错误时,作业都会在作业期间创建并运行多个 JAR 文件.但是,在弹出这些消息的几分钟后,终于出现了正确的输出.

One thing that I have noticed is that whenever this error appears, the job had created and ran multiple JAR files during the job. However, after a few minutes of these message popping up, the correct output finally appears.

有关如何摆脱这些消息的任何建议?

Any suggestions on how to get rid of these messages?

推荐答案

是的,问题是作业历史服务器没有运行.

Yes the problem was that the job history server was not running.

我们需要做的就是在命令提示符中输入这个命令来解决这个问题:

All we had to do to fix this problem was enter this command into the command prompt:

mr-jobhistory-daemon.sh start historyserver

此命令启动作业历史服务器.现在,如果我们输入jps",我们可以看到 JobHistoryServer 正在运行,我的 Pig 作业不再浪费时间尝试连接到服务器.

This command starts up the job history server. Now if we enter 'jps', we can see that the JobHistoryServer is running and my Pig jobs no longer waste time trying to connect to the server.

这篇关于Apache Pig 中的连接错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆