Zeppelin - 无法使用 %sql 查询我在 pyspark 注册的表 [英] Zeppelin - Cannot query with %sql a table I registered with pyspark

查看：21 发布时间：2021/11/14 23:49:52 apache-spark pyspark apache-spark-sql apache-zeppelin

本文介绍了Zeppelin - 无法使用 %sql 查询我在 pyspark 注册的表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 spark/zeppelin 的新手，我想完成一个简单的练习，我将把 csv 文件从 Pandas 转换为 Spark 数据框，然后注册该表以使用 sql 查询它并使用 Zeppelin 对其进行可视化.

但我似乎在最后一步失败了.

我使用的是 Spark 1.6.1

这是我的代码:

%pysparkspark_clean_df.registerTempTable("table1")打印 spark_clean_df.dtypes打印 sqlContext.sql("select count(*) from table1").collect()

输出如下:

[('id', 'bigint'), ('name', 'string'), ('host_id', 'bigint'), ('host_name', 'string'), ('街区', 'string'), ('latitude', 'double'), ('longitude', 'double'), ('room_type', 'string'), ('price', 'bigint'), ('minimum_nights', 'bigint'), ('number_of_reviews', 'bigint'), ('last_review', 'string'), ('reviews_per_month', 'double'), ('calculated_host_listings_count', 'bigint'), ('availability_365', 'bigint')][行(_c0=4961)]

但是当我尝试使用 %sql 时出现此错误:

%sql从表 1 中选择 *未找到表:table1；第 1 行 位置 14设置 zeppelin.spark.sql.stacktrace = true 以查看完整的堆栈跟踪

任何帮助将不胜感激 - 我什至不知道在哪里可以找到这个堆栈跟踪以及它如何帮助我.

谢谢:)

解决方案

Zeppelin 可以为不同的解释器创建不同的上下文，如果您使用 %spark 执行某些代码，而使用 %pyspark 解释器执行某些代码，您的 Zeppelin 可能有两个上下文.当您使用 %sql 时，它会在另一个上下文中查找，而不是在 %pyspark 中.尝试重新启动 Zeppelin 并执行 %pyspark 代码作为第一条语句，然后执行 %sql 作为第二条语句.

如果您转到解释器"选项卡，您可以在那里添加 zeppelin.spark.sql.stacktrace.重新启动 Zeppelin 后，您将在现在找不到表"的地方看到完整的堆栈跟踪.

其实这大概就是你的问题的答案使用Zeppelin 中的 %pyspark 解释器，我无法访问 %sql 中的表

努力去做

 %pysparksqlContext = sqlc

作为前两行

I am new to spark/zeppelin and I wanted to complete a simple exercise, where I will transform a csv file from pandas to Spark data frame and then register the table to query it with sql and visualise it using Zeppelin.

But I seem to be failing in the last step.

I am using Spark 1.6.1

Here is my code:

%pyspark
spark_clean_df.registerTempTable("table1")
print spark_clean_df.dtypes
print sqlContext.sql("select count(*) from table1").collect()

Here is the output:

[('id', 'bigint'), ('name', 'string'), ('host_id', 'bigint'), ('host_name', 'string'), ('neighbourhood', 'string'), ('latitude', 'double'), ('longitude', 'double'), ('room_type', 'string'), ('price', 'bigint'), ('minimum_nights', 'bigint'), ('number_of_reviews', 'bigint'), ('last_review', 'string'), ('reviews_per_month', 'double'), ('calculated_host_listings_count', 'bigint'), ('availability_365', 'bigint')]
[Row(_c0=4961)]

But when I try to use %sql I get this error:

%sql
select * from table1

Table not found: table1; line 1 pos 14
set zeppelin.spark.sql.stacktrace = true to see full stacktrace

Any help would be appreciated - I don't even know where to find this stacktrace and how could it help me.

Thanks :)

解决方案

Zeppelin can create different contexts for different interpreters it is possible that if you executed some code with %spark and some code with %pyspark interpreters your Zeppelin can have two contexts. And when you use %sql it is looking in another context not in %pyspark. Try restart Zeppelin and execute %pyspark code as first statement and than %sql as second.

If you go to 'Interpreters' tab you can add zeppelin.spark.sql.stacktrace there. And after restart Zeppelin you will see full stack trace in a place where you have 'Table not found' now.

Actually this is probably answer to your question When registering a table using the %pyspark interpreter in Zeppelin, I can't access the table in %sql

Try to do

    %pyspark
    sqlContext = sqlc

as first two lines

这篇关于Zeppelin - 无法使用 %sql 查询我在 pyspark 注册的表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Zeppelin - 无法使用 %sql 查询我在 pyspark 注册的表 [英] Zeppelin - Cannot query with %sql a table I registered with pyspark

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Zeppelin - 无法使用 %sql 查询我在 pyspark 注册的表 [英] Zeppelin - Cannot query with %sql a table I registered with pyspark

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭