SparkSQL 是否支持子查询? [英] Does SparkSQL support subquery?

查看:39
本文介绍了SparkSQL 是否支持子查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Spark shell 中运行这个查询,但它给了我错误,

I am running this query in Spark shell but it gives me error,

sqlContext.sql(
 "select sal from samplecsv where sal < (select MAX(sal) from samplecsv)"
).collect().foreach(println)

错误:

java.lang.RuntimeException: [1.47] 失败: ``)'' 预期但发现标识符 MAX

java.lang.RuntimeException: [1.47] failure: ``)'' expected but identifier MAX found

从samplecsv中选择sal,其中sal <(从 samplecsv 中选择 MAX(sal))^在 scala.sys.package$.error(package.scala:27​​)谁能解释一下,谢谢

select sal from samplecsv where sal < (select MAX(sal) from samplecsv) ^ at scala.sys.package$.error(package.scala:27) Can anybody explan me,thanks

推荐答案

计划功能:

  • SPARK-23945(Column.isin() 应该接受单个-列 DataFrame 作为输入).
  • SPARK-18455(对相关子查询处理的一般支持).立>
  • SPARK-23945 (Column.isin() should accept a single-column DataFrame as input).
  • SPARK-18455 (General support for correlated subquery processing).

Spark 2.0+

Spark SQL 应该支持相关和不相关的子查询.请参阅 SubquerySuite 了解详情.一些示例包括:

Spark SQL should support both correlated and uncorrelated subqueries. See SubquerySuite for details. Some examples include:

select * from l where exists (select * from r where l.a = r.c)
select * from l where not exists (select * from r where l.a = r.c)

select * from l where l.a in (select c from r)
select * from l where a not in (select c from r)

不幸的是,目前(Spark 2.0)无法使用 DataFrame DSL 表达相同的逻辑.

Unfortunately as for now (Spark 2.0) it is impossible to express the same logic using DataFrame DSL.

火花<2.0

Spark 支持 FROM 子句中的子查询(与 Hive <= 0.12 相同).

Spark supports subqueries in the FROM clause (same as Hive <= 0.12).

SELECT col FROM (SELECT *  FROM t1 WHERE bar) t2

它根本不支持 WHERE 子句中的子查询.一般来说,如果不升级为笛卡尔连接,就无法使用 Spark 表达任意子查询(特别是相关子查询).

It simply doesn't support subqueries in the WHERE clause.Generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't be expressed using Spark without promoting to Cartesian join.

由于子查询性能通常是典型关系系统中的一个重要问题,并且每个子查询都可以使用 JOIN 表示,因此这里没有功能损失.

Since subquery performance is usually a significant issue in a typical relational system and every subquery can be expressed using JOIN there is no loss-of-function here.

这篇关于SparkSQL 是否支持子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆