通过Mesos火花到MongoDB [英] Spark to MongoDB via Mesos

查看:111
本文介绍了通过Mesos火花到MongoDB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Mesos将Apache Spark连接到MongoDB.这是我的体系结构:-

I am trying to connect Apache Spark to MongoDB using Mesos. Here is my architecture: -

MongoDB:2个分片,1个配置服务器和1个查询服务器的MongoDB集群. Mesos:1个Mesos主站,4个Mesos从站

MongoDB: MongoDB Cluster of 2 shards, 1 config server and 1 query server. Mesos: 1 Mesos Master, 4 Mesos slaves

现在,我仅在1个节点上安装了Spark.那里没有太多可用的信息.我只想提出几个问题:-

Now I have installed Spark on just 1 node. There is not much information available on this out there. I just wanted to pose a few questions: -

据我了解,我可以通过mesos将Spark连接到MongoDB.换句话说,我最终将MongoDB用作存储层.我真的需要Hadoop吗?是否必须将所有数据仅拉入Hadoop以供Spark读取?

As per what I understand, I can connect Spark to MongoDB via mesos. In other words, I end up using MongoDB as a storage layer. Do I really Need Hadoop? Is it mandatory to pull all the data into Hadoop just for Spark to read it?

这是我问这个的原因. Spark安装期望设置HADOOP_HOME变量.这似乎是非常紧密的耦合!网络上的大多数帖子都谈到MongoDB-Hadoop连接器.如果您强迫我将所有内容都移动到hadoop,那就没有意义了.

Here is the reason I am asking this. The Spark Install expects the HADOOP_HOME variable to be set. This seems like very tight coupling !! Most of the posts on the net speak about MongoDB-Hadoop connector. It doesn't make sense if you're forcing me to move everything to hadoop.

有人回答吗?

问候 马里奥(Mario)

Regards Mario

推荐答案

Spark-mongo连接器是个好主意,此外,如果您要在hadoop集群中执行Spark,则需要设置HADOOP_HOME.

Spark-mongo connector is good idea, moreover if your are executing Spark in a hadoop cluster you need set HADOOP_HOME.

检查您的需求并进行测试(

Check your requeriments and test it (tutorial)

Basic working knowledge of MongoDB and Apache Spark. Refer to the MongoDB documentation and Spark documentation.
Running MongoDB instance (version 2.6 or later).
Spark 1.6.x.
Scala 2.10.x if using the mongo-spark-connector_2.10 package
Scala 2.11.x if using the mongo-spark-connector_2.11 package

与用于Hadoop的MongoDB连接器相比,用于Apache Spark的新MongoDB连接器提供了更高的性能,更易用的访问以及更高级的Spark功能.下表比较了两个连接器的功能.

The new MongoDB Connector for Apache Spark provides higher performance, greater ease of use and, access to more advanced Spark functionality than the MongoDB Connector for Hadoop. The following table compares the capabilities of both connectors.

然后,您需要使用mesos配置Spark:

Then you need to configure Spark with mesos:

将Spark连接到Mesos

To use Mesos from Spark, you need a Spark binary package available in a place accessible by Mesos, and a Spark driver program configured to connect to Mesos.

Alternatively, you can also install Spark in the same location in all the Mesos slaves, and configure spark.mesos.executor.home (defaults to SPARK_HOME) to point to that location.

这篇关于通过Mesos火花到MongoDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆