Spark联接失败Spark作业始终对联接失败(CDH 5.5.2,Spark 1.5.0) [英] Spark joins failing Spark job always failing for joins (CDH 5.5.2, Spark 1.5.0)

查看:129
本文介绍了Spark联接失败Spark作业始终对联接失败(CDH 5.5.2,Spark 1.5.0)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用新安装的CDH 5.5.2集群的Spark独立集群经常遇到错误.我们有7个工作节点,每个节点有16 GB内存.但是,几乎所有联接都失败了.

我已确保使用--executor-memory分配了全部内存,并确保已分配了那么多内存,并在Spark UI中进行了验证.

我们大多数错误如下.我们已经从我们这边检查了事情.但是我们的解决方案都没有起作用.

Caused by: java.io.FileNotFoundException: /tmp/spark-b9e69c8d-153b-4c4b-88b1-ac779c060de5/executor-44e88b75-5e79-4d96-b507-ddabcab30e1b/blockmgr-cd27625c-8716-49ac-903d-9d5c36cf2622/29/shuffle_1_66_0.index (Permission denied)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:275)
... 27 more
at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more

  1. /tmp具有777权限,但是/tmp没有权限.

  2. 我们已经将SPARK_LOCAL_DIRS配置为其他一些磁盘空间更好的文件夹,但是群集仍在使用/tmp,为什么呢?我们已经通过Cloudera Manager对其进行了更改,并在spark的spark配置中打印了spark.local.dirs,这提供了我们设置的文件夹.但是,在执行方面,它是另一种方式.它正在检查/tmp中的文件.我们在这里错过了什么吗??

  3. 我们已经关闭了火花纱,是否有任何配置的纱线会独立影响?

有人遇到这个问题吗?以及为什么这种情况反复出现在我们身上?我们在Horton工程中也有类似的集群,我们在其中安装了裸机火花(这不是分发的一部分),效果很好.但是,在我们的新集群中,我们面临着这个问题.也许我们可能错过了一些东西.但好奇地想知道我们错过了什么.

解决方案

对我来说有用

在所有节点上

sudo chmod -R 0777/tmp 须藤chmod + t/tmp

使用parallel-ssh

sudo parallel-ssh -h hosts.txt -l ubuntu --timeout = 0'sudo chmod -R 0777/tmp'

sudo parallel-ssh -h hosts.txt -l ubuntu --timeout = 0'sudo chmod + t/tmp'

We are running into frequent errors with spark standalone cluster with our newly installed CDH 5.5.2 cluster. We have 7 worker nodes each one has 16 GB memory. But, almost all joins are failing.

I have made sure i allocated full memory with --executor-memory and ensured it has allocated that much memory and by verifying it in Spark UI.

Most of our errors are as below. We have checked things from our side. But none of our solutions did work.

Caused by: java.io.FileNotFoundException: /tmp/spark-b9e69c8d-153b-4c4b-88b1-ac779c060de5/executor-44e88b75-5e79-4d96-b507-ddabcab30e1b/blockmgr-cd27625c-8716-49ac-903d-9d5c36cf2622/29/shuffle_1_66_0.index (Permission denied)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:275)
... 27 more
at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more

  1. /tmp has 777 permissions, but it is still telling as /tmp has no permssions.

  2. We have configured SPARK_LOCAL_DIRS to some other folder where we have better disk memory, but still the cluster is using /tmp, why.? we have changed it through Cloudera manager, and printed the spark.local.dirs in spark configuration in spark, which gives the folder that we set. But, when it comes to execution, it is other way. It is checking the files in /tmp. Are we missing any thing here.?

  3. we have turned off spark-yarn, does any configurations of yarn impacting standalone?

Has any one faced this issue? and why is this recurring to us.? We had similar cluster with horton works, where we installed bare-bones spark ( which is not part of distribution), which worked very well. But, in our new cluster, we are facing this issues. May be we might have missed some things.? but curious to know what we missed.

解决方案

this work for my

on all nodes

sudo chmod -R 0777 /tmp sudo chmod +t /tmp

with parallel-ssh

sudo parallel-ssh -h hosts.txt -l ubuntu --timeout=0 'sudo chmod -R 0777 /tmp'

sudo parallel-ssh -h hosts.txt -l ubuntu --timeout=0 'sudo chmod +t /tmp'

这篇关于Spark联接失败Spark作业始终对联接失败(CDH 5.5.2,Spark 1.5.0)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆