什么是星火任务?如何星火工人执行jar文件? [英] What is a task in Spark? How does the Spark worker execute the jar file?

查看:159
本文介绍了什么是星火任务?如何星火工人执行jar文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在<一读一些文档后href=\"http://spark.apache.org/docs/0.8.0/cluster-overview.html\">http://spark.apache.org/docs/0.8.0/cluster-overview.html,我得到了一些问题,我想澄清一下。

After reading some document on http://spark.apache.org/docs/0.8.0/cluster-overview.html, I got some question that I want to clarify.

从星火拿这个例子:

JavaSparkContext spark = new JavaSparkContext(
  new SparkConf().setJars("...").setSparkHome....);
JavaRDD<String> file = spark.textFile("hdfs://...");

// step1
JavaRDD<String> words =
  file.flatMap(new FlatMapFunction<String, String>() {
    public Iterable<String> call(String s) {
      return Arrays.asList(s.split(" "));
    }
  });

// step2
JavaPairRDD<String, Integer> pairs =
  words.map(new PairFunction<String, String, Integer>() {
    public Tuple2<String, Integer> call(String s) {
      return new Tuple2<String, Integer>(s, 1);
    }
  });

// step3
JavaPairRDD<String, Integer> counts =
  pairs.reduceByKey(new Function2<Integer, Integer>() {
    public Integer call(Integer a, Integer b) {
      return a + b;
    }
  });

counts.saveAsTextFile("hdfs://...");

因此​​,让我们说我有3个节点群集,节点1运行为主,而上述的驱动程序已正确贾里德(比如应用Test.jar的)。所以我现在运行主节点上该code,我相信之后的 SparkContext 正在创建的应用程序Test.jar的文件将被复制到工作节点(每个工人将创建该应用程序的目录)。

So let's say I have 3 nodes cluster, and node 1 running as master, and the above driver program has been properly jared (say application-test.jar). So now I'm running this code on the master node and I believe right after the SparkContext being created, the application-test.jar file will be copied to the worker nodes (and each worker will create a dir for that application).

所以现在我的问题:
是第一步,第二步第三步,并在被发送到工人的例子任务是什么?如果是,那么如何工人执行呢?像 java命令应用Test.jar的步骤1 等等?

So now my question: Are step1, step2 and step3 in the example tasks that get sent over to the workers? If yes, then how does the worker execute that? Like java -cp "application-test.jar" step1 and so on?

推荐答案

在创建 SparkContext ,每个工人一开始的执行人的。这是一个单独的进程(JVM),它也装载您的罐子。执行人连接回你的驱动程序。现在,司机可以向他们发送命令,如 flatMap 地图 reduceByKey 在你的榜样。当驾驶员退出,执行人关机。

When you create the SparkContext, each worker starts an executor. This is a separate process (JVM), and it loads your jar too. The executors connect back to your driver program. Now the driver can send them commands, like flatMap, map and reduceByKey in your example. When the driver quits, the executors shut down.

RDDS是有点像被分成分区大阵列,每个执行人可以容纳一些分区。

RDDs are sort of like big arrays that are split into partitions, and each executor can hold some of these partitions.

A 任务是从司机发送至您的序列功能对象执行人的命令。遗嘱执行人反序列化命令(这是可能的,因为它加载了您的罐子),并执行它放在一个分区。

A task is a command sent from the driver to an executor by serializing your Function object. The executor deserializes the command (this is possible because it has loaded your jar), and executes it on a partition.

(这是一个概念性的概述。我掩饰一些细节,但我希望这是有帮助的。)

(This is a conceptual overview. I am glossing over some details, but I hope it is helpful.)

要回答您的具体问题:没有,一个新的进程没有启动的每个步骤。一个新的进程开始于每个工人当 SparkContext 构造。

To answer your specific question: No, a new process is not started for each step. A new process is started on each worker when the SparkContext is constructed.

这篇关于什么是星火任务?如何星火工人执行jar文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆