在Zeppelin中使用%pyspark解释器注册表时,我无法在%sql中访问该表 [英] When registering a table using the %pyspark interpreter in Zeppelin, I can't access the table in %sql

查看:201
本文介绍了在Zeppelin中使用%pyspark解释器注册表时,我无法在%sql中访问该表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Zeppelin 0.5.5.我在这里找到了python的代码/示例,因为我无法使用自己的%pyspark http://www.makedatauseful.com/python-spark-sql-zeppelin-tutorial/.我觉得他的%pyspark示例有效,因为如果您使用原始的%spark zeppelin教程,则已经创建了"bank"表.

I am using Zeppelin 0.5.5. I found this code/sample here for python as I couldn't get my own to work with %pyspark http://www.makedatauseful.com/python-spark-sql-zeppelin-tutorial/. I have a feeling his %pyspark example worked because if you using the original %spark zeppelin tutorial the "bank" table is already created.

此代码在笔记本中.

%pyspark
from os import getcwd
# sqlContext = SQLContext(sc) # Removed with latest version I tested
zeppelinHome = getcwd()
bankText = sc.textFile(zeppelinHome+"/data/bank-full.csv")

bankSchema = StructType([StructField("age", IntegerType(),     False),StructField("job", StringType(), False),StructField("marital", StringType(), False),StructField("education", StringType(), False),StructField("balance", IntegerType(), False)])

bank = bankText.map(lambda s: s.split(";")).filter(lambda s: s[0] != "\"age\"").map(lambda s:(int(s[0]), str(s[1]).replace("\"", ""), str(s[2]).replace("\"", ""), str(s[3]).replace("\"", ""), int(s[5]) ))

bankdf = sqlContext.createDataFrame(bank,bankSchema)
bankdf.registerAsTable("bank")

此代码在同一笔记本中,但工作垫不同.

This code is in the same notebook but different work pad.

%sql 
SELECT count(1) FROM bank

org.apache.spark.sql.AnalysisException: no such table bank; line 1 pos 21
...

推荐答案

我发现了此问题的问题.在0.6.0之前,sqlContext变量是%pyspark中的sqlc.

I found the problem to this issue. Prior to 0.6.0 the sqlContext variable is sqlc in %pyspark.

可以在以下位置找到缺陷: https://issues.apache.org/jira/浏览/ZEPPELIN-134

Defect can be found here: https://issues.apache.org/jira/browse/ZEPPELIN-134

在Pyspark中,SQLContext当前在变量名称sqlc中可用.这与文档和scala中的变量名sqlContext无关.

In Pyspark, the SQLContext is currently available in the variable name sqlc. This is incosistent with the documentation and with the variable name in scala which is sqlContext.

sqlContext可以用作sqlc的变量,以及sqlc(用于向后兼容)

sqlContext can be used as a variable for the SQLContext, in addition to sqlc (for backward compatibility)

相关代码: https://github .com/apache/incubator-zeppelin/blob/master/spark/src/main/resources/python/zeppelin_pyspark.py#L66

建议的解决方法只是在%pyspark脚本中执行以下操作

The suggested workaround is simply to do the following in your %pyspark script

sqlContext = sqlc

sqlContext = sqlc

在这里找到:

https://mail-archives.apache.org/mod_mbox/incubator-zeppelin-users/201506.mbox/%3CCALf24sazkTxVd3EpLKTWo7yfE4NvW032j346N+6AuB7KKZS_AQ@mail.gmail.com%3E

这篇关于在Zeppelin中使用%pyspark解释器注册表时,我无法在%sql中访问该表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆