火花提交无法连接 [英] spark-submit unable to connect

查看:99
本文介绍了火花提交无法连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行命令后

spark-submit --class org.apache.spark.examples.SparkPi --proxy-user yarn --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue default ./examples/jars/spark-examples_2.11-2.3.0.jar 10000

我在输出中得到了它,并且一直在重试.我要去哪里错了?我是否缺少某些配置?

I get this in the output and it keeps on retrying. Where am I going wrong? Am I missing some configuration?

我已经为纱线创建了一个新用户并正在运行该用户.

I have created a new user for yarn and running that user.

WARN  Utils:66 - Your hostname, ukaleem-HP-EliteBook-850-G3 resolves to a loopback address: 127.0.1.1; using 10.XX.XX.XX instead (on interface enp0s31f6)
2018-06-14 16:50:41 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
Warning: Local jar /home/yarn/Documents/Scala-Examples/./examples/jars/spark-examples_2.11-2.3.0.jar does not exist, skipping.
2018-06-14 16:50:42 INFO  RMProxy:98 - Connecting to ResourceManager at /0.0.0.0:8032
2018-06-14 16:50:44 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

最后,它给出了例外情况

And in the end, it gives the exception

    Exception in thread "main" java.net.ConnectException: Call From ukaleem-HP-EliteBook-850-G3/127.0.1.1 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy8.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
    at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
    at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1146)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:179)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:177)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:177)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 28 more
2018-06-14 17:10:53 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-06-14 17:10:53 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-5bddb7f3-165f-451c-8ab4-bb7729f4237c

:在将配置文件添加到我的spark/conf目录之后,我现在得到此错误.

EDIT : After adding config files to my spark/conf dir, I get this error now.

我添加的文件是

* core-site.xml

*core-site.xml

dfs.hosts

dfs.hosts

大师

奴隶

yarn-site.xml *

yarn-site.xml*

还有更多.我了解的是,我只需要yarn-site.xml来告诉spark纱线簇的位置. (ID,地址,主机名等).

And some more. What I understand is that I only need yarn-site.xml to tell spark the location of the yarn cluster. (ids, address, hostname etc).

一直以来,我一直在想,即使我们想在Yarn上提交作业,这些配置也需要放在/etc/Hadoop目录中,而不是Spark/conf中.然后安装hadoop的目的是什么(除了通信之外)? 并跟随这个问题.如果配置需要进入spark/conf,则 HADOOP_CONF_DIR & YARN_CONF_DIR 应该指向etc/hadoop目录或spark/conf吗?

All this time I had been thinking that even we want to submit a job on Yarn these config need to go in /etc/Hadoop dir not in Spark/conf. Whats the purpose of installing hadoop then (other than communicating)? And following this question. If the config need to go in spark/conf then HADOOP_CONF_DIR & YARN_CONF_DIR should point to etc/hadoop dir or spark/conf?

    INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/06/19 11:04:50 INFO retry.RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 38176ms.
java.net.ConnectException: Call From ukaleem-HP-EliteBook-850-G3/127.0.1.1 to svc-hadoop-mgnt-pre-c2-01.jamba.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy13.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy14.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
    at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
    at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1146)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:179)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:177)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:177)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 29 more

推荐答案

假定您具有完全分布的纱线簇:您的spark-submit脚本无法找到纱线资源管理器的配置(基本上是纱线主节点).确保在您的环境中正确设置了HADOOP_CONF_DIR,并且它指向集群的配置.特别是您的yarn-site.xml.

Assuming you have a fully distributed yarn cluster: your spark-submit script is unable to find the configuration for the yarn resourcemanager (basically the yarn master node). Ensure you have HADOOP_CONF_DIR properly set in your environment, and that it points to your cluster's configuration. Specifically your yarn-site.xml.

hadoop软件包随 server client 软件一起提供. server 软件将是组成群集的许多运行的守护程序.如果您的工作站充当 client (宽松地使用该术语,与火花--deploy-mode并不完全相关),则hadoop client 软件必须知道计算机的网络位置.在群集中运行的 server 守护程序.如果您的yarn-site.xml为空,则从

The hadoop package comes with both server and client software. The server software would be the many daemons that run that make up the cluster. If your workstation is acting as a client (using that term loosely, not fully related to sparks --deploy-mode), then the hadoop client software must know the network locations of the server daemons running in the cluster. If your yarn-site.xml is empty, then it is pulling it's default values from yarn-defauls.xml (which is hard-coded, I believe).

假设您的集群未在HA模式下运行,并且是默认配置,那么您工作站的yarn-site.xml至少应包含如下条目:

Assuming your cluster is not running in HA mode, and is a mostly default configuration, then your workstation's yarn-site.xml should contain at least an entry like the following:

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>rm-host.yourdomain.com</value>
</property>

显然,将主机名替换为实际资源管理器运行所在的主机名.当然,与HDFS的任何火花交互都需要正确配置的hdfs-site.xml等.

Obviously, replace the hostname with the hostname where your actual resource manager is running. Of course, any spark interaction with HDFS will require a properly configured hdfs-site.xml, etc.

某些集群管理软件将具有类似于生成客户端配置"的内容. (特别是考虑到我的cloudera经验),它将为您提供.tar.gz,其中所有配置文件均已正确填充以从外部工作站访问群集.

Some cluster managing software will have something like "generate client configs" (thinking of my cloudera experience specifically), which will give you a .tar.gz with all of the config files correctly populated to access the cluster from an external workstation.

其他建议: 如果您打算在此群集中对纱线进行大量火花处理,则spark建议确保您具有

Further recommendations: If you plan to do spark on yarn a lot in this cluster, spark recommends making sure that you have the external shuffle service configured to launch with your yarn node managers. (Please bear in mind, this config directive would have to be present in the yarn-site.xml where yarn's node manager services are running, not on your workstation.

这篇关于火花提交无法连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆