在spark.sql中通过group by选择多个元素 [英] select multiple elements with group by in spark.sql

查看:1282
本文介绍了在spark.sql中通过group by选择多个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以在sql spark中按表分组以选择多个元素 我正在使用的代码:

is there any way to group by table in sql spark which selects multiple elements code i am using:

val df = spark.read.json("//path")
df.createOrReplaceTempView("GETBYID")

现在按以下方式分组:

val sqlDF = spark.sql(
  "SELECT count(customerId) FROM GETBYID group by customerId");

但是当我尝试时:

val sqlDF = spark.sql(
  "SELECT count(customerId),customerId,userId FROM GETBYID group by customerId");

火花产生错误:

org.apache.spark.sql.AnalysisException:表达式'getbyid.userId' 既不存在于组中,也不是集合函数. 如果您不在乎,则添加到分组依据或包装first()(或first_value) 您获得哪个值.

org.apache.spark.sql.AnalysisException: expression 'getbyid.userId' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

有什么可能的方法

推荐答案

是的,这是可能的,并且随附的错误消息描述了所有可能性.您可以将userId添加到groupBy:

Yes, it's possible and the error message you attached describes all the possibilities. You can either add the userId to groupBy:

val sqlDF = spark.sql("SELECT count(customerId),customerId,userId FROM GETBYID group by customerId, userId");

或使用first():

val sqlDF = spark.sql("SELECT count(customerId),customerId,first(userId) FROM GETBYID group by customerId");

这篇关于在spark.sql中通过group by选择多个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆