无法通过Spark访问hadoop集群主服务器 [英] can't acess hadoop cluster master via spark

查看:146
本文介绍了无法通过Spark访问hadoop集群主服务器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们将cloudera的发行版用于hadoop.我们有一个包含10个节点的工作集群.我正在尝试使用InteliJ从远程主机连接到群集.我正在使用Scala和spark.

We are using cloudera's distribution for hadoop. We have a working cluster with 10 nodes. I'm trying to connect to the cluster from a remote host with InteliJ. I'm using Scala and spark.

我通过sbt导入了下一个库

I imported the next libraries via sbt

libraryDependencies += "org.scalatestplus.play" %% "scalatestplus-play" % "3.1.2" % Test
libraryDependencies += "com.h2database" % "h2" % "1.4.196"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0"

并且我正在尝试使用以下代码创建一个SparkSession:

and I'm trying to create a SparkSession with the next code :

  val spark = SparkSession
.builder()
.appName("API")
.config("spark.sql.warehouse.dir", "/user/hive/warehouse")
.config("hive.metastore.uris","thrift://VMClouderaMasterDev01:9083")
.master("spark://10.150.1.22:9083")
.enableHiveSupport()
.getOrCreate()

但是出现以下错误:

[error] o.a.s.n.c.TransportResponseHandler - Still have 1 requests         
outstanding when connection from /10.150.1.22:9083 is closed
[warn] o.a.s.d.c.StandaloneAppClient$ClientEndpoint - Failed to connect to 
master 10.150.1.22:9083
org.apache.spark.SparkException: Exception thrown in awaitResult:
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)

    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
     ......
    at java.lang.Thread.run(Thread.java:748)
    Caused by: java.io.IOException: Connection from /10.150.1.22:9083 closed
    at org.apache.spark.network.client.TransportResponseHandler.channelInact
    ive(TransportResponseHandler.java:146)

说实话,我尝试连接不同的端口:8022,9023,但是没有用.我看到默认端口是7077,但是我没有任何进程正在侦听主服务器上的端口7077.

To be honest, I tried to connect with different ports: 8022,9023 but it didn't work. I saw that the default port is 7077, but I don't have any process that is listening on port 7077 on the master.

任何想法我该如何继续?如何检查主机正在侦听那些类型的连接的端口?

Any idea how can I continue? How can I check on what port the master is listening to those type of connections?

推荐答案

如果您使用的是Hadoop集群,则不应使用独立的Spark主数据库,而应使用YARN

If you're using a Hadoop cluster, you shouldn't have a standalone Spark master, you should be using YARN

master("yarn")

在这种情况下,必须导出 HADOOP_CONF_DIR 环境变量,该环境变量包含来自群集的yarn-site.xml的副本

In which case, you must export a HADOOP_CONF_DIR environment variable that contains a copy of the yarn-site.xml from the cluster

这篇关于无法通过Spark访问hadoop集群主服务器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆