通过Mesos火花到MongoDB [英] Spark to MongoDB via Mesos
问题描述
我正在尝试使用Mesos将Apache Spark连接到MongoDB.这是我的体系结构:-
I am trying to connect Apache Spark to MongoDB using Mesos. Here is my architecture: -
MongoDB:2个分片,1个配置服务器和1个查询服务器的MongoDB集群. Mesos:1个Mesos主站,4个Mesos从站
MongoDB: MongoDB Cluster of 2 shards, 1 config server and 1 query server. Mesos: 1 Mesos Master, 4 Mesos slaves
现在,我仅在1个节点上安装了Spark.那里没有太多可用的信息.我只想提出几个问题:-
Now I have installed Spark on just 1 node. There is not much information available on this out there. I just wanted to pose a few questions: -
据我了解,我可以通过mesos将Spark连接到MongoDB.换句话说,我最终将MongoDB用作存储层.我真的需要Hadoop吗?是否必须将所有数据仅拉入Hadoop以供Spark读取?
As per what I understand, I can connect Spark to MongoDB via mesos. In other words, I end up using MongoDB as a storage layer. Do I really Need Hadoop? Is it mandatory to pull all the data into Hadoop just for Spark to read it?
这是我问这个的原因. Spark安装期望设置HADOOP_HOME变量.这似乎是非常紧密的耦合!网络上的大多数帖子都谈到MongoDB-Hadoop连接器.如果您强迫我将所有内容都移动到hadoop,那就没有意义了.
Here is the reason I am asking this. The Spark Install expects the HADOOP_HOME variable to be set. This seems like very tight coupling !! Most of the posts on the net speak about MongoDB-Hadoop connector. It doesn't make sense if you're forcing me to move everything to hadoop.
有人回答吗?
问候 马里奥(Mario)
Regards Mario
推荐答案
Spark-mongo连接器是个好主意,此外,如果您要在hadoop集群中执行Spark,则需要设置HADOOP_HOME.
Spark-mongo connector is good idea, moreover if your are executing Spark in a hadoop cluster you need set HADOOP_HOME.
Check your requeriments and test it (tutorial)
Basic working knowledge of MongoDB and Apache Spark. Refer to the MongoDB documentation and Spark documentation.
Running MongoDB instance (version 2.6 or later).
Spark 1.6.x.
Scala 2.10.x if using the mongo-spark-connector_2.10 package
Scala 2.11.x if using the mongo-spark-connector_2.11 package
与用于Hadoop的MongoDB连接器相比,用于Apache Spark的新MongoDB连接器提供了更高的性能,更易用的访问以及更高级的Spark功能.下表比较了两个连接器的功能.
The new MongoDB Connector for Apache Spark provides higher performance, greater ease of use and, access to more advanced Spark functionality than the MongoDB Connector for Hadoop. The following table compares the capabilities of both connectors.
然后,您需要使用mesos配置Spark:
Then you need to configure Spark with mesos:
To use Mesos from Spark, you need a Spark binary package available in a place accessible by Mesos, and a Spark driver program configured to connect to Mesos.
Alternatively, you can also install Spark in the same location in all the Mesos slaves, and configure spark.mesos.executor.home (defaults to SPARK_HOME) to point to that location.
这篇关于通过Mesos火花到MongoDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!