如何在Azure Databricks PySpark中执行存储过程? [英] How to execute a stored procedure in Azure Databricks PySpark?

查看:147
本文介绍了如何在Azure Databricks PySpark中执行存储过程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够使用Azure Databricks中的PySpark执行简单的SQL语句,但是我想改为执行存储过程.以下是我尝试过的PySpark代码.

I am able to execute a simple SQL statement using PySpark in Azure Databricks but I want to execute a stored procedure instead. Below is the PySpark code I tried.

#initialize pyspark
import findspark
findspark.init('C:\Spark\spark-2.4.5-bin-hadoop2.7')
#import required modules
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import *
import pandas as pd

#Create spark configuration object
conf = SparkConf()
conf.setMaster("local").setAppName("My app")
#Create spark context and sparksession
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)

table = "dbo.test"
#read table data into a spark dataframe
jdbcDF = spark.read.format("jdbc") \
    .option("url", f"jdbc:sqlserver://localhost:1433;databaseName=Demo;integratedSecurity=true;") \
    .option("dbtable", table) \
    .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
    .load()

#show the data loaded into dataframe
#jdbcDF.show()
sqlQueries="execute testJoin"
resultDF=spark.sql(sqlQueries)
resultDF.show(resultDF.count(),False)

这行不通-我该怎么办?

This doesn't work — how do I do it?

推荐答案

目前尚不支持通过JDBC连接从azure databricks运行存储过程.但是您可以选择:

Running a stored procedure through a JDBC connection from azure databricks is not supported as of now. But your options are:

  1. 使用 pyodbc 库连接并执行您的过程.但是,通过使用此库,这意味着您将在所有工作线程空闲时在驱动程序节点上运行代码.有关详细信息,请参见本文. https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark

  1. Use a pyodbc library to connect and execute your procedure. But by using this library, it means that you will be running your code on the driver node while all your workers are idle. See this article for details. https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark

使用 SQL 表函数而不是过程.从某种意义上说,您可以使用在SQL查询的 FORM 子句中可以使用的任何东西.

Use a SQL table function rather than procedures. In a sense, you can use anything that you can use in the FORM clause of a SQL query.

由于您处于azure环境中,因此结合使用azure数据工厂(以执行过程)和azure数据块可以帮助您构建功能强大的管道.

Since you are in an azure environment, then using a combination of azure data factory (to execute your procedure) and azure databricks can help you to build pretty powerful pipelines.

这篇关于如何在Azure Databricks PySpark中执行存储过程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆