按 col_1,cube(col_2,col_3) 分组在 pyspark 中产生错误 [英] group by col_1,cube(col_2,col_3) producing error in pyspark
本文介绍了按 col_1,cube(col_2,col_3) 分组在 pyspark 中产生错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在 pyspark 中有以下视图 (dfview
) -
I have the following view (dfview
) in pyspark -
+----+-----+-------+
|roll|grade|subject|
+----+-----+-------+
| 1| A| Maths|
| 1| A| Chem|
| 1| B| Phy|
| 2| A| Maths|
| 2| B| Chem|
| 2| B| Phy|
+----+-----+-------+
我在 spark.sql
-
spark.sql('''
select
grouping(roll),
grouping(grade),
grouping(subject),
count(*)
from
dfview
group by roll,cube(grade,subject)
''').show()
我收到以下错误 -
AnalysisException: grouping() can only be used with GroupingSets/Cube/Rollup;
'Aggregate [roll#19400, cube(grade#19401, subject#19402)], [grouping(roll#19400) AS grouping(roll)#19420, grouping(grade#19401) AS grouping(grade)#19421, grouping(subject#19402) AS grouping(subject)#19422, count(1) AS count(1)#19423L]
+- SubqueryAlias dfview
+- LogicalRDD [roll#19400, grade#19401, subject#19402], false
不过,我在 oracle, 19c
中尝试了类似形式的查询,并且它正确执行.pyspark 不支持这种形式的group by
?我正在使用 pyspark 3.1.2
However I tried a similar form of query in oracle, 19c
and it executed properly. Does pyspark not support this form of group by
? I am using pyspark 3.1.2
推荐答案
GROUP BY grade, subject WITH CUBE
您只能对您在 GROUP BY 子句中使用的列使用 grouping()
函数.
you can use grouping()
function only for the columns which you used in GROUP BY clause.
这篇关于按 col_1,cube(col_2,col_3) 分组在 pyspark 中产生错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文