什么是“相关标量子查询必须聚合"?意思是? [英] What does "Correlated scalar subqueries must be Aggregated" mean?
问题描述
我使用 Spark 2.0.
I use Spark 2.0.
我想执行以下 SQL 查询:
I'd like to execute the following SQL query:
val sqlText = """
select
f.ID as TID,
f.BldgID as TBldgID,
f.LeaseID as TLeaseID,
f.Period as TPeriod,
coalesce(
(select
f ChargeAmt
from
Fact_CMCharges f
where
f.BldgID = Fact_CMCharges.BldgID
limit 1),
0) as TChargeAmt1,
f.ChargeAmt as TChargeAmt2,
l.EFFDATE as TBreakDate
from
Fact_CMCharges f
join
CMRECC l on l.BLDGID = f.BldgID and l.LEASID = f.LeaseID and l.INCCAT = f.IncomeCat and date_format(l.EFFDATE,'D')<>1 and f.Period=EFFDateInt(l.EFFDATE)
where
f.ActualProjected = 'Lease'
except(
select * from TT1 t2 left semi join Fact_CMCharges f2 on t2.TID=f2.ID)
"""
val query = spark.sql(sqlText)
query.show()
似乎coalesce
中的内部语句给出了以下错误:
It seems that the inner statement in coalesce
gives the following error:
pyspark.sql.utils.AnalysisException: u'Correlated scalar subqueries must be Aggregated: GlobalLimit 1
+- LocalLimit 1
查询有什么问题?
推荐答案
您必须确保您的子查询根据定义(而不是根据数据)只返回一行.否则 Spark Analyzer 在解析 SQL 语句时会报错.
You have to make sure that your sub-query by definition (and not by data) only returns a single row. Otherwise Spark Analyzer complains while parsing the SQL statement.
因此,当催化剂无法仅通过查看 SQL 语句(不查看您的数据)100% 确定子查询仅返回单行时,就会抛出此异常.
So when catalyst can't make 100% sure just by looking at the SQL statement (without looking at your data) that the sub-query only returns a single row, this exception is thrown.
如果您确定您的子查询只给出一行,您可以使用以下之一 聚合标准函数,让Spark Analyzer开心:
If you are sure that your subquery only gives a single row you can use one of the following aggregation standard functions, so Spark Analyzer is happy:
第一
平均值
max
min
这篇关于什么是“相关标量子查询必须聚合"?意思是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!