Apache Pig中的连接错误 [英] Connection Error in Apache Pig

查看:250
本文介绍了Apache Pig中的连接错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Hadoop 2.0.5运行Apache Pig .11.1。



我在Pig中运行的大多数简单作业都很好。



但是,每当我尝试在大数据集或LIMIT运算符上使用GROUP BY时,都会遇到这些连接错误:

pre > 2013-07-29 13:24:08,591 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - 应用程序状态已完成。 FinalApplicationStatus =成功。重定向到作业历史记录服务器
013-07-29 11:57:29,421 [main] INFO org.apache.hadoop.ipc.Client - 重试连接到服务器:0.0.0.0/0.0.0.0:10020。已经尝试0次(s);重试策略是RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1 SECONDS)

2013-07-29 11:57:30,421 [main] INFO org.apache.hadoop.ipc.Client - 重试连接到服务器:0.0.0.0/0.0.0.0:10020。已经尝试过1次;重试策略是RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1 SECONDS)

2013-07-29 11:57:31,422 [main] INFO org.apache.hadoop.ipc.Client - 重试连接到服务器:0.0.0.0/0.0.0.0:10020。已经尝试过2次(s);重试策略是RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO org.apache.hadoop.ipc.Client - 重试连接到服务器:0.0.0.0/0.0.0.0:10020。已经尝试了9次(s);重试策略是RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin(auth:SIMPLE )原因:java.io.IOException

奇怪的是,在这些错误持续出现约2分钟,他们将停止,并在底部显示正确的输出。



因此,Hadoop运行良好并计算出合适的输出。问题在于这些连接错误一直在弹出。



LIMIT 操作符总是会出现此错误。它发生在MapReduce模式和本地模式。 GROUP BY 操作符可以在小数据集上正常工作。



我注意到的一件事是,出现时,该作业在作业过程中创建并运行了多个JAR文件。然而,在这些消息弹出几分钟后,终于出现正确的输出。



关于如何摆脱这些消息的任何建议?

解决方案

是的,问题在于作业历史记录服务器没有运行。



要解决此问题,请将此命令输入到命令提示符中:

  mr-jobhistory-daemon.sh start historyserver 

该命令启动作业历史记录服务器。现在,如果我们输入'jps',我们可以看到JobHistoryServer正在运行,我的Pig作业不再浪费时间连接到服务器。


I am running Apache Pig .11.1 with Hadoop 2.0.5.

Most simple jobs that I run in Pig work perfectly fine.

However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors:

2013-07-29 13:24:08,591 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 
013-07-29 11:57:29,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:30,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:31,422 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException

The strange thing is that after these errors keeping appearing for about 2 minutes, they'll stop, and the correct output shows up at the bottom.

So Hadoop is running fine and computing the proper output. The problem is just these connection errors that keep popping up.

The LIMIT operator always gets this error. It happens on both MapReduce mode and local mode. The GROUP BY operator will work fine on small datasets.

One thing that I have noticed is that whenever this error appears, the job had created and ran multiple JAR files during the job. However, after a few minutes of these message popping up, the correct output finally appears.

Any suggestions on how to get rid of these messages?

解决方案

Yes the problem was that the job history server was not running.

All we had to do to fix this problem was enter this command into the command prompt:

mr-jobhistory-daemon.sh start historyserver

This command starts up the job history server. Now if we enter 'jps', we can see that the JobHistoryServer is running and my Pig jobs no longer waste time trying to connect to the server.

这篇关于Apache Pig中的连接错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆