Pyspark连接到Microsoft SQL Server? [英] Pyspark connection to the Microsoft SQL server?
本文介绍了Pyspark连接到Microsoft SQL Server?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在SQL Server中有一个庞大的数据集,我想将SQL Server与python连接,然后使用pyspark运行查询.
I have a huge dataset in SQL server, I want to Connect the SQL server with python, then use pyspark to run the query.
我已经看过JDBC驱动程序,但是我没有找到实现的方法,我是用PYODBC来做的,但是没有火花.
I've seen the JDBC driver but I don't find the way to do it, I did it with PYODBC but not with a spark.
任何帮助将不胜感激.
推荐答案
请使用以下内容连接到Microsoft SQL:
Please use the following to connect to Microsoft SQL:
def connect_to_sql(
spark, jdbc_hostname, jdbc_port, database, data_table, username, password
):
jdbc_url = "jdbc:sqlserver://{0}:{1}/{2}".format(jdbc_hostname, jdbc_port, database)
connection_details = {
"user": username,
"password": password,
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
}
df = spark.read.jdbc(url=jdbc_url, table=data_table, properties=connection_details)
return df
spark
是一个 SparkSession
对象,其余的都很清楚.
spark
is a SparkSession
object, and the rest are pretty clear.
您还可以将下推查询传递给 read.jdbc
You can also pass pushdown queries to read.jdbc
这篇关于Pyspark连接到Microsoft SQL Server?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文