阿帕奇星火：驱动程序（而不只是执行人）试图连接到卡桑德拉 [英] Apache Spark: Driver (instead of just the Executors) tries to connect to Cassandra

查看：201 发布时间：2016/5/22 15:23:13 scala apache-spark cassandra

本文介绍了阿帕奇星火：驱动程序（而不只是执行人）试图连接到卡桑德拉的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想我还没有完全理解星火如何工作的。

I guess I'm not yet fully understanding how Spark works.

下面是我的设置：

我运行在单机模式下的星火集群。我使用的是4机这样的：一个是法师，另外三个是工人

I'm running a Spark cluster in Standalone mode. I'm using 4 machines for this: One is the Master, the other three are Workers.

我写了从卡桑德拉集群中读取数据的应用程序（请参阅https://github.com/journeymonitor/analyze/blob/master/spark/src/main/scala/SparkApp.scala#L118).

I have written an application that reads data from a Cassandra cluster (see https://github.com/journeymonitor/analyze/blob/master/spark/src/main/scala/SparkApp.scala#L118).

三节点卡桑德拉群集上还举办星火工作节点相同的机器上运行。星火主节点不运行卡桑德拉节点：

The 3-nodes Cassandra cluster runs on the same machines that also host the Spark Worker nodes. The Spark Master node does not run a Cassandra node:

Machine 1      Machine 2        Machine 3        Machine 4
Spark Master   Spark Worker     Spark Worker     Spark Worker
               Cassandra node   Cassandra node   Cassandra node

这背后的原因是我要优化数据局部性 - 在集群上运行我的星火应用程序时，每个工人只需要与当地卡桑德拉节点

The reasoning behind this is that I want to optimize data locality - when running my Spark app on the cluster, each Worker only needs to talk to its local Cassandra node.

现在，通过运行提交我的星火应用群集时火花提交--deploy模式客户端--master火花（星火硕士），我希望以下内容：


Now, when submitting my Spark app to the cluster by running spark-submit --deploy-mode client --master spark://machine-1 from Machine 1 (the Spark Master), I expect the following:

驱动程序实例启动的星火主

驱动程序启动每个星火一名遗嘱执行人工人

驱动程序我的应用程序分发到每个执行人

我的应用程序运行在每个执行者，并从那里，会谈卡桑德拉通过 127.0.0.1:9042  



a Driver instance is started on the Spark Master
the Driver starts one Executor on each Spark Worker
the Driver distributes my application to each Executor
my application runs on each Executor, and from there, talks to Cassandra via 127.0.0.1:9042

然而，这似乎不是这种情况。相反，星火法师试图跟卡桑德拉（和失败，因为在机器1台主机上没有卡桑德拉节点）。
However, this doesn't seem to be the case. Instead, the Spark Master tries to talk to Cassandra (and fails, because there is no Cassandra node on the Machine 1 host).

它是什么，我误会？它的工作方式不同？确实在事实上驱动读取卡桑德拉中的数据，并分发该数据到执行人？但后来我也从来没有读过比机器1 ，即使我的群集的总内存就足够了。

What is it that I misunderstand? Does it work differently? Does in fact the Driver read the data from Cassandra, and distribute the data to the Executors? But then I could never read data larger than memory of Machine 1, even if the total memory of my cluster is sufficient.

或者，难道司机谈话卡桑德拉不读取数据，但要找出如何将数据分区，并指示执行人阅读他们的部分数据？

Or, does the Driver talk to Cassandra not to read data, but to find out how to partition the data, and instructs the Executors to read "their" part of the data?

如果有人能微启我，那会是多少AP preciated。

If someone can enlight me, that would be much appreciated.

阿帕奇星火：驱动程序（而不只是执行人）试图连接到卡桑德拉 [英] Apache Spark: Driver (instead of just the Executors) tries to connect to Cassandra

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

阿帕奇星火：驱动程序（而不只是执行人）试图连接到卡桑德拉 [英] Apache Spark: Driver (instead of just the Executors) tries to connect to Cassandra

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭