从Spark写入mongoDB [英] Writing to mongoDB from Spark
问题描述
我试图从spark写入MongoDB,出于试用目的,我正在启动spark 2 shell(Spark版本= 2.1.1.2.6.1.0-129),如下所述:-
I am trying to write to MongoDB from spark , for trial purpose, I am launching spark 2 shell (Spark version=2.1.1.2.6.1.0-129) as mentioned below :-
spark-shell --jars /bigdata/datalake/mongo-spark-connector_2.11-2.1.1.jar,/bigdata/datalake/mongo-scala-driver_2.11-2.1.0.jar,/bigdata/datalake/mongo-java-driver-3.2.2.jar
并在其中运行以下代码:-
And running following code into it :-
import com.mongodb.spark._
import org.apache.spark.sql.{SaveMode, SparkSession}
spark.conf.set("spark.mongodb.output.uri","mongodb://<IP>:27017/menas.tests")
spark.conf.set("spark.mongodb.output.collection", "tests")
val df = spark.sparkContext.parallelize( 1 to 10).toDF().withColumn("value",col("value").cast("string"))
MongoSpark.save(df.write.option("uri", "mongodb://<IP>:27017/menas.tests").mode("append"))
但是,这导致以下错误.基本上,我想将数据框的内容保存到MongoDB.
But, it's resulting into following error. Basically, I want to save content of dataframe to MongoDB .
推荐答案
spark-shell --jars/bigdata/datalake/mongo-spark-connector_2.11-2.1.1.jar,/bigdata/datalake/mongo-scala-driver_2.11-2.1.0.jar,/bigdata/datalake/mongo-java-driver-3.2.2.jar
spark-shell --jars /bigdata/datalake/mongo-spark-connector_2.11-2.1.1.jar,/bigdata/datalake/mongo-scala-driver_2.11-2.1.0.jar,/bigdata/datalake/mongo-java-driver-3.2.2.jar
基于错误日志以及调用spark-shell
的方式,这是因为您正在尝试导入和使用MongoDB Java驱动程序v3.2.2. Spark连接器v2.1.1依赖于MongoDB Java驱动程序v3.4.2.另请参见 mongo-spark v2.1.1 Dependencies.scala .
Based on the error log, and the way spark-shell
is invoked, this is because you're trying to import and use MongoDB Java driver v3.2.2. The Spark connector v2.1.1 has a dependency on MongoDB Java driver v3.4.2. See also mongo-spark v2.1.1 Dependencies.scala.
您可以使用--packages
来指定 MongoDB Spark Connector,而不是手动指定jar. .这样,将自动获取依赖项.例如,要使用MongoDB Spark连接器版本2.1.1:
Instead of specifying the jars manually, you could use --packages
to specify MongoDB Spark Connector. This way the dependencies will be automatically fetched. For example, to use MongoDB Spark connector version 2.1.1:
./bin/spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.11:2.1.1
这将自动获取与连接器兼容的 MongoDB Java驱动程序.
This will automatically fetch a MongoDB Java driver compatible with the connector.
您应该看到类似以下的输出:
You should see output similar as below:
:: loading settings :: url = jar:file:/home/ubuntu/spark-2.1.2-bin-hadoop2.6/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.mongodb.spark#mongo-spark-connector_2.11;2.1.1 in central
found org.mongodb#mongo-java-driver;3.4.2 in central
downloading https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.1.1/mongo-spark-connector_2.11-2.1.1.jar ...
[SUCCESSFUL ] org.mongodb.spark#mongo-spark-connector_2.11;2.1.1!mongo-spark-connector_2.11.jar (1291ms)
downloading https://repo1.maven.org/maven2/org/mongodb/mongo-java-driver/3.4.2/mongo-java-driver-3.4.2.jar ...
[SUCCESSFUL ] org.mongodb#mongo-java-driver;3.4.2!mongo-java-driver.jar (612ms)
:: resolution report :: resolve 4336ms :: artifacts dl 1919ms
:: modules in use:
org.mongodb#mongo-java-driver;3.4.2 from central in [default]
org.mongodb.spark#mongo-spark-connector_2.11;2.1.1 from central in [default]
有关更多信息,另请参见《 MongoDB Spark Connector Scala指南》
For more information see also MongoDB Spark Connector Scala Guide
这篇关于从Spark写入mongoDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!