如何在 Azure Databricks PySpark 中执行存储过程? [英] How to execute a stored procedure in Azure Databricks PySpark?

查看:46
本文介绍了如何在 Azure Databricks PySpark 中执行存储过程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够在 Azure Databricks 中使用 PySpark 执行简单的 SQL 语句,但我想改为执行存储过程.下面是我试过的 PySpark 代码.

I am able to execute a simple SQL statement using PySpark in Azure Databricks but I want to execute a stored procedure instead. Below is the PySpark code I tried.

#initialize pyspark
import findspark
findspark.init('C:\Spark\spark-2.4.5-bin-hadoop2.7')
#import required modules
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
from pyspark.sql import *
import pandas as pd

#Create spark configuration object
conf = SparkConf()
conf.setMaster("local").setAppName("My app")
#Create spark context and sparksession
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)

table = "dbo.test"
#read table data into a spark dataframe
jdbcDF = spark.read.format("jdbc") \
    .option("url", f"jdbc:sqlserver://localhost:1433;databaseName=Demo;integratedSecurity=true;") \
    .option("dbtable", table) \
    .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
    .load()

#show the data loaded into dataframe
#jdbcDF.show()
sqlQueries="execute testJoin"
resultDF=spark.sql(sqlQueries)
resultDF.show(resultDF.count(),False)

这不起作用 - 我该怎么做?

This doesn't work — how do I do it?

推荐答案

目前不支持通过 JDBC 连接从 azure databricks 运行存储过程.但您的选择是:

Running a stored procedure through a JDBC connection from azure databricks is not supported as of now. But your options are:

  1. 使用 pyodbc 库连接并执行您的过程.但是通过使用这个库,这意味着您将在所有工作人员空闲时在驱动程序节点上运行您的代码.有关详细信息,请参阅此文章.https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark

  1. Use a pyodbc library to connect and execute your procedure. But by using this library, it means that you will be running your code on the driver node while all your workers are idle. See this article for details. https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark

使用 SQL 表函数而不是过程.从某种意义上说,您可以使用可以在 SQL 查询的 FORM 子句中使用的任何内容.

Use a SQL table function rather than procedures. In a sense, you can use anything that you can use in the FORM clause of a SQL query.

既然您处于 azure 环境中,那么结合使用 azure 数据工厂(执行您的程序)和 azure 数据块可以帮助您构建非常强大的管道.

Since you are in an azure environment, then using a combination of azure data factory (to execute your procedure) and azure databricks can help you to build pretty powerful pipelines.

这篇关于如何在 Azure Databricks PySpark 中执行存储过程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆