找不到数据源:com.mongodb.spark.sql.DefaultSource [英] Failed to find data source: com.mongodb.spark.sql.DefaultSource

查看:367
本文介绍了找不到数据源:com.mongodb.spark.sql.DefaultSource的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将spark(pyspark)连接到mongodb,如下所示:

I'm trying to connect spark (pyspark) to mongodb as follows:

conf = SparkConf()
conf.set('spark.mongodb.input.uri', default_mongo_uri)
conf.set('spark.mongodb.output.uri', default_mongo_uri)
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
spark = SparkSession \
    .builder \
    .appName("my-app") \
    .config("spark.mongodb.input.uri", default_mongo_uri) \
    .config("spark.mongodb.output.uri", default_mongo_uri) \
    .getOrCreate()

但是当我执行以下操作时:

But when I do the following:

users = spark.read.format("com.mongodb.spark.sql.DefaultSource") \
        .option("uri", '{uri}.{col}'.format(uri=mongo_uri, col='users')).load()

我收到此错误:

java.lang.ClassNotFoundException:无法找到数据源: com.mongodb.spark.sql.DefaultSource

java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource

我在pyspark shell中做了同样的事情,并且能够检索数据.这是我运行的命令:

I did the same thing from pyspark shell and I was able to retrieve data. This is the command I ran:

pyspark --conf "spark.mongodb.input.uri=mongodb_uri" --conf "spark.mongodb.output.uri=mongodburi" --packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.2

但是在这里,我们可以选择指定我们需要使用的软件包.但是独立应用程序和脚本呢?如何在此处配置mongo-spark-connector.

But here we have the option to specify the package we need to use. But what about standalone apps and scripts. how can I configure mongo-spark-connector there.

有什么想法吗?

推荐答案

如果您使用的是 SparkContext& SparkSession ,您已经提到了SparkConf中的连接器jar包,请检查以下代码:

If you are using SparkContext & SparkSession, you have mentioned the connector jar packages in SparkConf, check the following Code:

    from pyspark import SparkContext,SparkConf
    conf = SparkConf().set("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.3.2")
    sc = SparkContext(conf=conf)

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb://xxx.xxx.xxx.xxx:27017/sample1.zips") \
    .config("spark.mongodb.output.uri", "mongodb://xxx.xxx.xxx.xxx:27017/sample1.zips") \
    .getOrCreate()

    df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
    df.printSchema()

如果您仅使用 SparkSession ,请使用以下代码:

If you are using only SparkSession then use following code:

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb://xxx.xxx.xxx.xxx:27017/sample1.zips") \
    .config("spark.mongodb.output.uri", "mongodb://xxx.xxx.xxx.xxx:27017/sample1.zips") \
    .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11:2.3.2') \
    .getOrCreate()

    df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
    df.printSchema()

这篇关于找不到数据源:com.mongodb.spark.sql.DefaultSource的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆