使用PySpark读取MySQL [英] MySQL read with PySpark

查看：934 发布时间：2020/9/4 19:35:39 python-3.x pyspark-sql

本文介绍了使用PySpark读取MySQL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下测试代码:

from pyspark import SparkContext, SQLContext
sc = SparkContext('local')
sqlContext = SQLContext(sc)
print('Created spark context!')


if __name__ == '__main__':
    df = sqlContext.read.format("jdbc").options(
        url="jdbc:mysql://localhost/mysql",
        driver="com.mysql.jdbc.Driver",
        dbtable="users",
        user="user",
        password="****",
        properties={"driver": 'com.mysql.jdbc.Driver'}
    ).load()

    print(df)

运行它时，出现以下错误:

When I run it, I get the following error:

java.lang.ClassNotFoundException:com.mysql.jdbc.Driver

java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

在Scala中，这是通过将.jar mysql-connector-java导入到项目中来解决的.

In Scala, this is solved by importing the .jar mysql-connector-java into the project.

但是，在python中，我不知道如何告诉pyspark模块链接mysql-connector文件.

However, in python I have no idea how to tell the pyspark module to link the mysql-connector file.

我看到这样的例子可以解决

I have seen this solved with examples like

spark --package=mysql-connector-java testfile.py

但是我不希望这样，因为它迫使我以一种怪异的方式运行我的脚本.我想要一个全Python解决方案，或者将文件复制到某个地方，或者将一些内容添加到路径中.

But I don't want this since it forces me to run my script in a weird way. I would like an all python solution or copy a file somewhere or, add something to the Path.

推荐答案

在初始化SparkConf之前创建sparkContext时，可以将参数传递给spark-submit:

You can pass arguments to spark-submit when creating your sparkContext before SparkConf is initialized:

import os
from pyspark import SparkConf, SparkContext

SUBMIT_ARGS = "--packages mysql:mysql-connector-java:5.1.39 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
conf = SparkConf()
sc = SparkContext(conf=conf)

，或者您可以将它们添加到您的$SPARK_HOME/conf/spark-defaults.conf

or you can add them to your $SPARK_HOME/conf/spark-defaults.conf

这篇关于使用PySpark读取MySQL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用PySpark读取MySQL [英] MySQL read with PySpark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用PySpark读取MySQL [英] MySQL read with PySpark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭