使用 PySpark 读取 MySQL [英] MySQL read with PySpark
问题描述
我有以下测试代码:
from pyspark import SparkContext, SQLContext
sc = SparkContext('local')
sqlContext = SQLContext(sc)
print('Created spark context!')
if __name__ == '__main__':
df = sqlContext.read.format("jdbc").options(
url="jdbc:mysql://localhost/mysql",
driver="com.mysql.jdbc.Driver",
dbtable="users",
user="user",
password="****",
properties={"driver": 'com.mysql.jdbc.Driver'}
).load()
print(df)
当我运行它时,我收到以下错误:
When I run it, I get the following error:
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
在 Scala 中,这是通过将 .jar mysql-connector-java
导入项目来解决的.
In Scala, this is solved by importing the .jar mysql-connector-java
into the project.
但是,在 python 中,我不知道如何告诉 pyspark 模块链接 mysql-connector 文件.
However, in python I have no idea how to tell the pyspark module to link the mysql-connector file.
我已经用类似的例子解决了这个问题
I have seen this solved with examples like
spark --package=mysql-connector-java testfile.py
但我不想要这个,因为它迫使我以一种奇怪的方式运行我的脚本.我想要一个全 python 解决方案或在某处复制一个文件,或者在路径中添加一些内容.
But I don't want this since it forces me to run my script in a weird way. I would like an all python solution or copy a file somewhere or, add something to the Path.
推荐答案
在 SparkConf
之前创建 sparkContext
时,您可以将参数传递给 spark-submit
代码>被初始化:
You can pass arguments to spark-submit
when creating your sparkContext
before SparkConf
is initialized:
import os
from pyspark import SparkConf, SparkContext
SUBMIT_ARGS = "--packages mysql:mysql-connector-java:5.1.39 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
conf = SparkConf()
sc = SparkContext(conf=conf)
或者您可以将它们添加到您的 $SPARK_HOME/conf/spark-defaults.conf
or you can add them to your $SPARK_HOME/conf/spark-defaults.conf
这篇关于使用 PySpark 读取 MySQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!