火花错误:无法将RPC发送到Datanode [英] Spark Error: Failed to Send RPC to Datanode

查看:95
本文介绍了火花错误:无法将RPC发送到Datanode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark节俭服务器出现的问题很少

we had quite few issues Spark thrift server

从日志中我们可以看到:无法将RPC 9053901149358924945发送到/DATA NODE MACHINE:50149

from the log we can see that : Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149

请告知为什么会发生这种情况,对此有什么解决方案?

please advice why this happens , and what is the solution for this?

Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149: java.nio.channels.ClosedChannelException
more spark-hive-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master03.sys67.com.out


Spark Command: /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.0.3-8 -cp /usr/hdp/current/spark2-thriftserver/conf/:/usr/hdp/current/spark2-thriftserver/jars/*:/usr/hdp/c
urrent/hadoop-client/conf/ -Xmx10000m org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=15g --properties-file /usr/hdp/current/spark2-thriftserver/conf/spark-th
rift-sparkconf.conf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --executor-cores 7 spark-internal
========================================
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
18/02/07 17:55:21 ERROR TransportClient: Failed to send RPC 9053901149358924945 to /12.87.2.64:50149: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/02/07 17:55:21 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(2,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 9053901149358924945 to /12.87.2.64:50149: java.nio.channels.ClosedChannelException
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249)
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:514)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:488)
        at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
        at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:438)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/02/07 17:55:21 ERROR SparkContext: Error initializing SparkContext.

我们还尝试从此链接中获取一些好处- https://thebipalace.com/2017/08/23/spark-error-failed-to-send-rpc-to-datanode/

we also try to capture some good point from this link - https://thebipalace.com/2017/08/23/spark-error-failed-to-send-rpc-to-datanode/

但这是一个新的ambari群集,我们认为本文不适合该特定问题(我们的ambari群集上现在没有正在运行的Spark作业)

but this is a new ambari cluster and we don't think this article fit for this particular issue ( no spark jobs are running now on our ambari cluster )

推荐答案

这可能是由于磁盘空间不足所致.就我而言,我在AWS EMR中使用1个r4.2xlarge(主版)&2个r4.8xlarge(核心).火花调整和增加从属节点解决了我的问题.最常见的问题是内存压力,错误配置的bcoz(即错误大小的执行程序),长时间运行的任务以及导致笛卡尔运算的任务.您可以通过适当的缓存并允许数据偏斜来加快作业的速度.为了获得最佳性能,请监视和检查长时间运行且消耗资源的Spark作业执行.希望对您有所帮助.

It could be due to insufficient disk space. In my case, i was running a Spark Job in AWS EMR with 1 r4.2xlarge (Master) & 2 r4.8xlarge (Core). Spark tuning and increasing the slave nodes solved my problem. Most common issue is memory pressure, bcoz of bad configs (i.e. wrong-sized executors), long-running tasks, and tasks that result in cartesian operations. You can speed up jobs with appropriate caching, and by allowing for data skew. For the best performance, monitor and review long-running and resource-consuming Spark job executions. Hope it helps.

参考=> EMR Spark-TransportClient:无法发送RPC

这篇关于火花错误:无法将RPC发送到Datanode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆