如何提交阿帕奇星火作业的Hadoop YARN在Azure HDInsight [英] How to submit Apache Spark job to Hadoop YARN on Azure HDInsight

查看:280
本文介绍了如何提交阿帕奇星火作业的Hadoop YARN在Azure HDInsight的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我感到非常兴奋,HDInsight切换到Hadoop的版本,其支持Apache星火通过纱线。 Apache的Spark是一个更好的拟合的并行编程范式比马preduce的,我要执行的任务。

I am very excited that HDInsight switched to Hadoop version 2, which supports Apache Spark through YARN. Apache Spark is a much better fitting parallel programming paradigm than MapReduce for the task that I want to perform.

我无法但是如何做到远程作业提交的Apache星火工作的我HDInsight集群找到任何文件。对于远程作业提交标准麻preduce工作我知道有几个REST端点像邓普顿和Oozie的。至于我能找到,运行星火作业通过邓普顿是不可能的。我也觉得这是可能的合并星火作业到Oozie的,但我读过,这是做一个非常繁琐的事情,我也读过作业故障检测的一些报道在这种情况下不工作。

I was unable to find any documentation however on how to do remote job submission of a Apache Spark job to my HDInsight cluster. For remote job submission of standard MapReduce jobs I know that there are several REST endpoints like Templeton and Oozie. But as for as I was able to find, running Spark jobs is not possible through Templeton. I did find it to be possible to incorporate Spark jobs into Oozie, but I've read that this is a very tedious thing to do and also I've read some reports of job failure detection not working in this case.

大概要有提交星火作业更合适的方式。有谁知道该怎么做阿帕奇星火作业的远程作业提交到HDInsight?

Probably there must be a more appropriate way to submit Spark jobs. Does anyone know how to do remote job submissions of Apache Spark jobs to HDInsight?

提前感谢!

推荐答案

您可以在hdinsight群集上安装的火花。你必须通过创建自定义集群做这件事并添加动作脚本,将在它为群集创建虚拟机时的集群上安装的火花。

You can install spark on a hdinsight cluster. You have to do it at by creating a custom cluster and adding an action script that will install Spark on the cluster at the time it creates the VMs for the Cluster.

与动作脚本安装在群集安装为pretty容易,你可以通过添加code几行一个标准的创建自定义集群脚本/程序做它在C#或PowerShell的。

To install with an action script on cluster install is pretty easy, you can do it in C# or powershell by adding a few lines of code to a standard custom create cluster script/program.

PowerShell的:

powershell:

# ADD SCRIPT ACTION TO CLUSTER CONFIGURATION
$config = Add-AzureHDInsightScriptAction -Config $config -Name "Install Spark" -ClusterRoleCollection HeadNode -Urin https://hdiconfigactions.blob.core.windows.net/sparkconfigactionv02/spark-installer-v02.ps1

C#:

// ADD THE SCRIPT ACTION TO INSTALL SPARK
clusterInfo.ConfigActions.Add(new ScriptAction(
  "Install Spark", // Name of the config action
  new ClusterNodeType[] { ClusterNodeType.HeadNode }, // List of nodes to install Spark on
  new Uri("https://hdiconfigactions.blob.core.windows.net/sparkconfigactionv02/spark-installer-v02.ps1"), // Location of the script to install Spark
  null //because the script used does not require any parameters.
));

然后你可以RDP到headnode和运行使用火花壳或使用火花提交运行作业。我不知道怎么会跑火花的工作,而不是RDP到的headnode,但是这是一个另一个问题。

you can then RDP into the headnode and run use the spark-shell or use spark-submit to run jobs. I am not sure how would run spark job and not rdp into the the headnode but that is an other question.

这篇关于如何提交阿帕奇星火作业的Hadoop YARN在Azure HDInsight的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆