什么是“必须对相关标量子查询进行汇总"?意思是? [英] What does "Correlated scalar subqueries must be Aggregated" mean?

查看:235
本文介绍了什么是“必须对相关标量子查询进行汇总"?意思是?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Spark 2.0.

I use Spark 2.0.

我想执行以下SQL查询:

I'd like to execute the following SQL query:

val sqlText = """
select
  f.ID as TID,
  f.BldgID as TBldgID,
  f.LeaseID as TLeaseID,
  f.Period as TPeriod,
  coalesce(
    (select
       f ChargeAmt
     from
       Fact_CMCharges f
     where
       f.BldgID = Fact_CMCharges.BldgID
     limit 1),
     0) as TChargeAmt1,
  f.ChargeAmt as TChargeAmt2,
  l.EFFDATE as TBreakDate
from
  Fact_CMCharges f
join
  CMRECC l on l.BLDGID = f.BldgID and l.LEASID = f.LeaseID and l.INCCAT = f.IncomeCat and date_format(l.EFFDATE,'D')<>1 and f.Period=EFFDateInt(l.EFFDATE) 
where
  f.ActualProjected = 'Lease'
except(
  select * from TT1 t2 left semi join Fact_CMCharges f2 on t2.TID=f2.ID) 
"""
val query = spark.sql(sqlText)
query.show()

coalesce中的内部语句似乎给出以下错误:

It seems that the inner statement in coalesce gives the following error:

pyspark.sql.utils.AnalysisException: u'Correlated scalar subqueries must be Aggregated: GlobalLimit 1\n+- LocalLimit 1\n

查询出了什么问题?

推荐答案

您必须确保根据定义(而不是按数据)的子查询仅返回一行.否则,Spark Analyzer会在分析SQL语句时抱怨.

You have to make sure that your sub-query by definition (and not by data) only returns a single row. Otherwise Spark Analyzer complains while parsing the SQL statement.

因此,当催化剂无法仅通过查看SQL语句(而不查看数据)来确保100%确定子查询仅返回单行时,将引发此异常.

So when catalyst can't make 100% sure just by looking at the SQL statement (without looking at your data) that the sub-query only returns a single row, this exception is thrown.

如果您确定子查询仅给出一行,则可以使用以下

If you are sure that your subquery only gives a single row you can use one of the following aggregation standard functions, so Spark Analyzer is happy:

  • first
  • avg
  • max
  • min
  • first
  • avg
  • max
  • min

这篇关于什么是“必须对相关标量子查询进行汇总"?意思是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆