按 col_1,cube(col_2,col_3) 分组在 pyspark 中产生错误 [英] group by col_1,cube(col_2,col_3) producing error in pyspark

查看:23
本文介绍了按 col_1,cube(col_2,col_3) 分组在 pyspark 中产生错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 pyspark 中有以下视图 (dfview) -

I have the following view (dfview) in pyspark -

+----+-----+-------+
|roll|grade|subject|
+----+-----+-------+
|   1|    A|  Maths|
|   1|    A|   Chem|
|   1|    B|    Phy|
|   2|    A|  Maths|
|   2|    B|   Chem|
|   2|    B|    Phy|
+----+-----+-------+

我在 spark.sql -

spark.sql('''
          select
            grouping(roll),
            grouping(grade),
            grouping(subject),
            count(*)
          from
            dfview
          group by roll,cube(grade,subject)
''').show()

我收到以下错误 -

AnalysisException: grouping() can only be used with GroupingSets/Cube/Rollup;
'Aggregate [roll#19400, cube(grade#19401, subject#19402)], [grouping(roll#19400) AS grouping(roll)#19420, grouping(grade#19401) AS grouping(grade)#19421, grouping(subject#19402) AS grouping(subject)#19422, count(1) AS count(1)#19423L]
+- SubqueryAlias dfview
   +- LogicalRDD [roll#19400, grade#19401, subject#19402], false

不过,我在 oracle, 19c 中尝试了类似形式的查询,并且它正确执行.pyspark 不支持这种形式的group by?我正在使用 pyspark 3.1.2

However I tried a similar form of query in oracle, 19c and it executed properly. Does pyspark not support this form of group by? I am using pyspark 3.1.2

推荐答案

Spark-SQL.

GROUP BY grade, subject WITH CUBE

您只能对您在 GROUP BY 子句中使用的列使用 grouping() 函数.

you can use grouping() function only for the columns which you used in GROUP BY clause.

这篇关于按 col_1,cube(col_2,col_3) 分组在 pyspark 中产生错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆