如何在Spark Standalone上调试Spark应用程序? [英] How to debug Spark application on Spark Standalone?

查看:81
本文介绍了如何在Spark Standalone上调试Spark应用程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用主节点和几个工作节点在群集上调试Spark应用程序.我已经成功使用Spark独立集群管理器设置了主节点和工作节点.我下载了带有二进制文件的spark文件夹,并使用以下命令设置了工作节点和主节点.这些命令从spark目录执行.

启动主控的命令

./sbin/start-master.sh

用于启动工作程序节点的命令

./bin/spark-class org.apache.spark.deploy.worker.Worker master-URL

提交申请的命令

./sbin/spark-submit --class Application --master URL ~/app.jar

现在,当我提交我的应用程序时,我想了解工作节点上Spark源代码的控制流程(我只想使用使用reduce()的给定示例之一).我假设我应该在Eclipse上安装Spark. Apache Spark网站上的Eclipse设置链接似乎已损坏.对于设置Spark和Eclipse以便在工作程序节点上逐步浏览Spark源代码,我将不胜感激.

谢谢!

解决方案

区分调试驱动程序和调试执行程序之一很重要.它们需要传递给spark-submit

的不同选项

要调试驱动程序,可以在spark-submit命令中添加以下内容.然后将您的远程调试器设置为连接到启动驱动程序的节点.

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

在此示例中,指定了端口5005,但是如果该端口上已经在运行某些端口,则可能需要自定义端口.

连接到执行程序是类似的,在spark-submit命令中添加以下选项.

--num-executors 1 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=wm1b0-8ab.yourcomputer.org:5005,suspend=n"

将地址替换为本地计算机的地址. (最好测试一下是否可以从Spark集群访问它.)

在这种情况下,请以侦听模式启动调试器,然后启动spark程序并等待执行程序附加到调试器.重要的是将执行程序的数量设置为1,否则多个执行程序都将尝试连接到调试器,这可能会导致问题.

这些示例用于在sparkMaster设置为yarn-client的情况下运行,尽管它们在mesos下运行时也可能起作用.如果您正在使用yarn-cluster模式运行,则可能必须将驱动程序设置为连接到调试器,而不是将调试器附加到驱动程序,因为您不必事先知道驱动程序将在哪个节点上执行./p>

I am trying to debug a Spark Application on a cluster using a master and several worker nodes. I have been successful at setting up the master node and worker nodes using Spark standalone cluster manager. I downloaded the spark folder with binaries and use the following commands to setup worker and master nodes. These commands are executed from the spark directory.

command for launching master

./sbin/start-master.sh

command for launching worker node

./bin/spark-class org.apache.spark.deploy.worker.Worker master-URL

command for submitting application

./sbin/spark-submit --class Application --master URL ~/app.jar

Now, I would like to understand the flow of control through the Spark source code on the worker nodes when I submit my application(I just want to use one of the given examples that use reduce()). I am assuming I should setup Spark on Eclipse. The Eclipse setup link on the Apache Spark website seems to be broken. I would appreciate some guidance on setting up Spark and Eclipse to enable stepping through Spark source code on the worker nodes.

Thanks!

解决方案

It's important to distinguish between debugging the driver program and debugging one of the executors. They require different options passed to spark-submit

For debugging the driver you can add the following to your spark-submit command. Then set your remote debugger to connect to the node you launched your driver program on.

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

In this example port 5005 was specified, but you may need to customize that if something is already running on that port.

Connecting to an executor is similar, add the following options to your spark-submit command.

--num-executors 1 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=wm1b0-8ab.yourcomputer.org:5005,suspend=n"

Replace the address with your local computer's address. (It's a good idea to test that you can access it from your spark cluster).

In this case, start your debugger in listening mode, then start your spark program and wait for the executor to attach to your debugger. It's important to set the number of executors to 1 or multiple executors will all try to connect to your debugger, likely causing problems.

These examples are for running with sparkMaster set as yarn-client although they may also work when running under mesos. If you're running using yarn-cluster mode you may have to set the driver to attach to your debugger rather than attaching your debugger to the driver, since you won't necessarily know in advance what node the driver will be executing on.

这篇关于如何在Spark Standalone上调试Spark应用程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆