提交后,SparkAppHandle State为LOST,但驱动程序运行正常 [英] SparkAppHandle State is LOST after submit, but the driver runs flawlessly

查看:108
本文介绍了提交后,SparkAppHandle State为LOST,但驱动程序运行正常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用spark java API将驱动程序提交到本地Spark集群(1个主+ 1个工作器).在附加了侦听器的情况下调用startApplication之后,对stateChanged的第一次调用将给出LOST状态.

I'm using spark java API to submit a driver to a local Spark cluster (1 master + 1 worker). After calling startApplication with a Listener attached, the first call to stateChanged gives the LOST state.

驱动程序提交正常,并且在工作程序中运行正常.

The Driver is submitted OK and runs fine in the worker.

我尝试使用等待循环而不是监听器.

I've tried with a waiting loop instead of a Listener.

我尝试使用Spark版本2.3.1和2.4.3.

I've tried with Spark versions 2.3.1 and 2.4.3.

我已经在OSX和Ubuntu中尝试过.

I've tried in OSX and Ubuntu.

我尝试将Spark Master主机更改为机器的IP,而不是名称.

I've tried changing the Spark Master Host to the machine's IP instead of the name.

SparkLauncher launcher = new SparkLauncher(env)
    .setAppResource(path)
    .setMainClass("full.package.name.RTADriver")
    .setMaster("spark://" + sparkMasterHost + ":" + sparkMasterPort)
    .setAppName("rta_scala_app_")
    .setDeployMode("cluster")
    .setConf("spark.ui.enabled", "true")
    .addAppArgs(runnerStr)
    .setVerbose(true);

SparkAppHandle handle = launcher.startApplication();

while (!handle.getState().equals(SparkAppHandle.State.FINISHED)){
    System.out.println("Wait Loop: App_ID: " + handle.getAppId() + " state: " +  handle.getState());
    Thread.sleep(10000);
}

System.out在我的代码上的日志:

First State App_ID: null state: UNKNOWN
Wait Loop: App_ID: null state: UNKNOWN
Wait Loop: App_ID: null state: LOST
Wait Loop: App_ID: null state: LOST
...

重要的Spark提交日志:

INFO: 19/06/04 11:27:54 INFO Utils: Successfully started service 'driverClient' on port 52077.
INFO: 19/06/04 11:27:54 INFO TransportClientFactory: Successfully created connection to /10.10.0.179:7077 after 34 ms (0 ms spent in bootstraps)
INFO: 19/06/04 11:27:54 INFO ClientEndpoint: Driver successfully submitted as driver-20190604112754-0030
INFO: 19/06/04 11:27:54 INFO ClientEndpoint: ... waiting before polling master for driver state
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: ... polling master for driver state
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: State of driver-20190604112754-0030 is RUNNING
INFO: 19/06/04 11:27:59 INFO ClientEndpoint: Driver running on 10.10.0.179:49705 (worker-20190603154544-10.10.0.179-49705)
INFO: 19/06/04 11:27:59 INFO ShutdownHookManager: Shutdown hook called
INFO: 19/06/04 11:27:59 INFO ShutdownHookManager: Deleting directory /private/var/folders/90/pgndgkk11lj0qb4q5qw_f03c0000gn/T/spark-8d8d92b9-8d0c-43a1-8bb9-3d08f1519c53
Wait Loop: App_ID: null state: LOST
...

推荐答案

我刚刚遇到了同样的情况.我的猜测是由于部署模式为集群",Spark驱动程序进程在另一个带有Spark Launcher进程的主机中运行;因此,启动器进程与spark应用程序丢失"了连接.

I just encountered the same situation. My guess is due to the deploy mode "cluster", the spark driver process was running in a different host with spark launcher process; hence the launcher process "lost" connection with the spark app.

这篇关于提交后,SparkAppHandle State为LOST,但驱动程序运行正常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆