如何解决来自apache-spark的对等消息重置连接重置? [英] How to fix Connection reset by peer message from apache-spark?

查看:357
本文介绍了如何解决来自apache-spark的对等消息重置连接重置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常不断收到以下异常,我想知道为什么会这样吗?经过研究,我发现我可以做.set("spark.submit.deployMode", "nio");,但是那也不起作用,我正在使用spark 2.0.0

I keep getting the the following exception very frequently and I wonder why this is happening? After researching I found I could do .set("spark.submit.deployMode", "nio"); but that did not work either and I am using spark 2.0.0

WARN TransportChannelHandler: Exception in connection from /172.31.3.245:46014
    java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:898)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)

推荐答案

即使我尝试了很多事情,我也遇到了同样的错误.我的工作过去运行了很长时间才陷入该错误.我尝试了一些有助于解决问题的工作.虽然,至少在我的工作正常的情况下,我仍然遇到相同的错误.

I was getting the same error even if I tried many things.My job used to get stuck throwing this error after running a very long time. I tried few work around which helped me to resolve. Although, I still get the same error by at least my job runs fine.

  1. 一个原因可能是执行者自杀,以为他们失去了与主机的连接.我在spark-defaults.conf文件中添加了以下配置.

  1. one reason could be the executors kills themselves thinking that they lost the connection from the master. I added the below configurations in spark-defaults.conf file.

spark.network.timeout 10000000 spark.executor.heartbeatInterval 10000000 基本上,我增加了网络超时和心跳间隔

spark.network.timeout 10000000 spark.executor.heartbeatInterval 10000000 basically,I have increased the network timeout and heartbeat interval

曾经卡住的特定步骤,我只是缓存了用于处理的数据帧(在曾经卡住的步骤中)

The particular step which used to get stuck, I just cached the dataframe that is used for processing (in the step which used to get stuck)

注意:-这些都是解决方法,我仍然在错误日志中看到相同的错误,但是我的工作没有被终止.

Note:- These are work arounds, I still see the same error in error logs but the my job does not get terminated.

这篇关于如何解决来自apache-spark的对等消息重置连接重置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆