无法通过Spark访问hadoop集群主服务器 [英] can't acess hadoop cluster master via spark
问题描述
我们将cloudera的发行版用于hadoop.我们有一个包含10个节点的工作集群.我正在尝试使用InteliJ从远程主机连接到群集.我正在使用Scala和spark.
We are using cloudera's distribution for hadoop. We have a working cluster with 10 nodes. I'm trying to connect to the cluster from a remote host with InteliJ. I'm using Scala and spark.
我通过sbt导入了下一个库
I imported the next libraries via sbt
libraryDependencies += "org.scalatestplus.play" %% "scalatestplus-play" % "3.1.2" % Test
libraryDependencies += "com.h2database" % "h2" % "1.4.196"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0"
并且我正在尝试使用以下代码创建一个SparkSession:
and I'm trying to create a SparkSession with the next code :
val spark = SparkSession
.builder()
.appName("API")
.config("spark.sql.warehouse.dir", "/user/hive/warehouse")
.config("hive.metastore.uris","thrift://VMClouderaMasterDev01:9083")
.master("spark://10.150.1.22:9083")
.enableHiveSupport()
.getOrCreate()
但是出现以下错误:
[error] o.a.s.n.c.TransportResponseHandler - Still have 1 requests
outstanding when connection from /10.150.1.22:9083 is closed
[warn] o.a.s.d.c.StandaloneAppClient$ClientEndpoint - Failed to connect to
master 10.150.1.22:9083
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
......
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Connection from /10.150.1.22:9083 closed
at org.apache.spark.network.client.TransportResponseHandler.channelInact
ive(TransportResponseHandler.java:146)
说实话,我尝试连接不同的端口:8022,9023,但是没有用.我看到默认端口是7077,但是我没有任何进程正在侦听主服务器上的端口7077.
To be honest, I tried to connect with different ports: 8022,9023 but it didn't work. I saw that the default port is 7077, but I don't have any process that is listening on port 7077 on the master.
任何想法我该如何继续?如何检查主机正在侦听那些类型的连接的端口?
Any idea how can I continue? How can I check on what port the master is listening to those type of connections?
推荐答案
如果您使用的是Hadoop集群,则不应使用独立的Spark主数据库,而应使用YARN
If you're using a Hadoop cluster, you shouldn't have a standalone Spark master, you should be using YARN
master("yarn")
在这种情况下,必须导出 HADOOP_CONF_DIR
环境变量,该环境变量包含来自群集的yarn-site.xml的副本
In which case, you must export a HADOOP_CONF_DIR
environment variable that contains a copy of the yarn-site.xml from the cluster
这篇关于无法通过Spark访问hadoop集群主服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!