如何使用程序化火花提交功能 [英] How to use the programmatic spark submit capability

查看:78
本文介绍了如何使用程序化火花提交功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个最近的功能(2015年春季),显然旨在允许以编程方式提交Spark作业.

There is a somewhat recent (Spring 2015) feature apparently intended to allow submitting a spark job programmatically.

这是JIRA https://issues.apache.org/jira/browse/SPARK-4924

但是,如何实际使用这些功能还是有不确定性的(也包括我在内).这是吉拉语中的最后一条评论:

However there is uncertainty (and count me as well) about how to actually use these features. Here are the last comments in the jira:

当要求这项工作的真正作者进一步解释时,是在API文档中查找".

When asking the actual author of this work to further explain it is "look in the API docs".

用户文档"是Spark API文档.

The "user document" is the Spark API documentation.

作者没有提供更多细节,并且显然认为整个问题都是自我解释.如果任何人都可以在这里进行说明:特别是-在API文档中何处描述了此较新的Spark Submit功能-会感激的.

The author did not provide further details and apparently feels the whole issue were self explanatory. If anyone can connect the dots here: specifically - where in the API docs is this newer Spark Submit capability described - it would be appreciated.

以下是我正在寻找的一些信息-指向以下内容:

Here is some of the info I am looking for -Pointers to the following:

  • Spark API已添加了哪些功能
  • 我们如何使用它们
  • 任何示例/其他相关文档和/或代码

更新接受的答案中提到的SparkLauncher确实在琐碎的(master = local [*])条件下启动了一个简单的应用程序.尚待观察它在实际集群上的可用性.在将打印语句添加到链接代码后:

Update The SparkLauncher referred to in the accepted answer does launch a simple app under trivial ( master=local[*]) conditions. It remains to be seen how usable it will be on an actual cluster. After adding a print statement to the linked code:

println(启动.正在等待.") spark.waitFor()

println("launched.. and waiting..") spark.waitFor()

我们确实看到了:

启动..并等待..

launched.. and waiting..

嗯,这可能只是向前迈出的一小步.当我转向真正的集群环境时,将更新此问题.

Well this is probably a small step forward. Will update this question as I move towards a real clustered environment.

推荐答案

查看

Looking at the details of the pull request, it seems that the functionality is provided by the SparkLauncher class, described in the API docs here.

public class SparkLauncher extends Object

Spark应用程序的启动器.

Launcher for Spark applications.

使用此类以编程方式启动Spark应用程序.班级 使用构建器模式允许客户端配置Spark 应用程序并将其作为子进程启动.

Use this class to start Spark applications programmatically. The class uses a builder pattern to allow clients to configure the Spark application and launch it as a child process.

API文档相当少,但我发现了一个博客文章,其中提供了

The API docs are rather minimal, but I found a blog post that gives a worked example (code also available in a GitHub repo). I have copied a simplified version of the example below (untested) in case the links go stale:

import org.apache.spark.launcher.SparkLauncher

object Launcher extends App {
  val spark = new SparkLauncher()
    .setSparkHome("/home/user/spark-1.4.0-bin-hadoop2.6")
    .setAppResource("/home/user/example-assembly-1.0.jar")
    .setMainClass("MySparkApp")
    .setMaster("local[*]")
    .launch();
  spark.waitFor();
}

另请参阅:

  • Another tutorial blog post / review of the feature
  • A book chapter on the topic

这篇关于如何使用程序化火花提交功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆