如何在 Spark SQL 中访问 python 变量? [英] How can I access python variable in Spark SQL?

查看:19
本文介绍了如何在 Spark SQL 中访问 python 变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Azure Databricks 的 jupyter notebook 文件中的 %python 下创建了 python 变量.如何访问相同的变量以在 %sql 下进行比较.示例如下:

%pythonRunID_Goal = sqlContext.sql("SELECT CONCAT(SUBSTRING(RunID,1,6),SUBSTRING(RunID,1,6),'01_')FROM RunID_Pace").first()[0]AS RunID_Goal

%sql选择类型,KPIDate,值发件人表在哪里RunID = RunID_Goal(这是%python下创建的变量,想在这里比较)

当我运行它时它会抛出一个错误:SQL 语句中的错误:AnalysisException:无法解析给定的输入列RunID_Goal":我是新的 azure databricks 和 spark sql 任何形式的帮助将不胜感激.

解决方案

一种解决方法可能是使用

另一种方法是通过 Spark 配置传递变量.你可以像这样设置变量值(请注意变量应该有一个前缀 - 在这种情况下它是 c.):

spark.conf.set("c.var", "some-value")

然后从 SQL 引用变量为 ${var-name}:

%sqlselect * from table where column = '${c.var}'

这样做的一个优点是您也可以将此变量用于表名等.缺点是您需要对变量进行转义,例如为字符串值放入单引号.

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql. Below is the example:

%python

RunID_Goal = sqlContext.sql("SELECT CONCAT(SUBSTRING(RunID,1,6),SUBSTRING(RunID,1,6),'01_') 
FROM RunID_Pace").first()[0] 
AS RunID_Goal

%sql
SELECT Type , KPIDate, Value
FROM table
WHERE
RunID = RunID_Goal (This is the variable created under %python and want to compare over here)

When I run this it throws an error: Error in SQL statement: AnalysisException: cannot resolve 'RunID_Goal' given input columns: I am new azure databricks and spark sql any sort of help would be appreciated.

解决方案

One workaround could be to use Widgets to pass parameters between cells. For example, on Python side it could be as following:

# generate test data
import pyspark.sql.functions as F
spark.range(100).withColumn("rnd", F.rand()).write.mode("append").saveAsTable("abc")

# set widgets
import random
vl = random.randint(0, 100)
dbutils.widgets.text("my_val", str(vl))

and then you can refer the value from the widget inside the SQL code:

%sql
select * from abc where id = getArgument('my_val')

will give you:

Another way is to pass variable via Spark configuration. You can set variable value like this (please note that that the variable should have a prefix - in this case it's c.):

spark.conf.set("c.var", "some-value")

and then from SQL refer to variable as ${var-name}:

%sql 
select * from table where column = '${c.var}'

One advantage of this is that you can use this variable also for table names, etc. Disadvantage is that you need to do the escaping of the variable, like putting into single quotes for string values.

这篇关于如何在 Spark SQL 中访问 python 变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆