SparkSQL 是否支持子查询? [英] Does SparkSQL support subquery?
问题描述
我在 Spark shell 中运行这个查询,但它给了我错误,
I am running this query in Spark shell but it gives me error,
sqlContext.sql(
"select sal from samplecsv where sal < (select MAX(sal) from samplecsv)"
).collect().foreach(println)
错误:
java.lang.RuntimeException: [1.47] 失败: ``)'' 预期但发现标识符 MAX
java.lang.RuntimeException: [1.47] failure: ``)'' expected but identifier MAX found
从samplecsv中选择sal,其中sal <(从 samplecsv 中选择 MAX(sal))^在 scala.sys.package$.error(package.scala:27)谁能解释一下,谢谢
select sal from samplecsv where sal < (select MAX(sal) from samplecsv) ^ at scala.sys.package$.error(package.scala:27) Can anybody explan me,thanks
推荐答案
计划功能:
- SPARK-23945(Column.isin() 应该接受单个-列 DataFrame 作为输入).
- SPARK-18455(对相关子查询处理的一般支持).立>
- SPARK-23945 (Column.isin() should accept a single-column DataFrame as input).
- SPARK-18455 (General support for correlated subquery processing).
Spark 2.0+
Spark SQL 应该支持相关和不相关的子查询.请参阅 SubquerySuite
了解详情.一些示例包括:
Spark SQL should support both correlated and uncorrelated subqueries. See SubquerySuite
for details. Some examples include:
select * from l where exists (select * from r where l.a = r.c)
select * from l where not exists (select * from r where l.a = r.c)
select * from l where l.a in (select c from r)
select * from l where a not in (select c from r)
不幸的是,目前(Spark 2.0)无法使用 DataFrame
DSL 表达相同的逻辑.
Unfortunately as for now (Spark 2.0) it is impossible to express the same logic using DataFrame
DSL.
火花<2.0
Spark 支持 FROM
子句中的子查询(与 Hive <= 0.12 相同).
Spark supports subqueries in the FROM
clause (same as Hive <= 0.12).
SELECT col FROM (SELECT * FROM t1 WHERE bar) t2
它根本不支持 WHERE
子句中的子查询.一般来说,如果不升级为笛卡尔连接,就无法使用 Spark 表达任意子查询(特别是相关子查询).
It simply doesn't support subqueries in the WHERE
clause.Generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't be expressed using Spark without promoting to Cartesian join.
由于子查询性能通常是典型关系系统中的一个重要问题,并且每个子查询都可以使用 JOIN
表示,因此这里没有功能损失.
Since subquery performance is usually a significant issue in a typical relational system and every subquery can be expressed using JOIN
there is no loss-of-function here.
这篇关于SparkSQL 是否支持子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!