在 spark.sql 中使用 group by 选择多个元素 [英] select multiple elements with group by in spark.sql

查看：60 发布时间：2021/11/14 22:47:10 scala apache-spark apache-spark-sql bigdata

本文介绍了在 spark.sql 中使用 group by 选择多个元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有办法在选择多个元素的sql spark中按表分组我正在使用的代码:

is there any way to group by table in sql spark which selects multiple elements code i am using:

val df = spark.read.json("//path")
df.createOrReplaceTempView("GETBYID")

现在按喜欢分组:

val sqlDF = spark.sql(
  "SELECT count(customerId) FROM GETBYID group by customerId");

但是当我尝试时:

val sqlDF = spark.sql(
  "SELECT count(customerId),customerId,userId FROM GETBYID group by customerId");

Spark 报错:

org.apache.spark.sql.AnalysisException: 表达式 'getbyid.userId'既不存在于 group by 中，也不是聚合函数.如果您不在乎，请添加到 group by 或包装在 first() (或 first_value)中你得到的价值.;

org.apache.spark.sql.AnalysisException: expression 'getbyid.userId' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

有没有办法做到这一点

推荐答案

是的，这是可能的，您附加的错误消息描述了所有的可能性.您可以将 userId 添加到 groupBy:

Yes, it's possible and the error message you attached describes all the possibilities. You can either add the userId to groupBy:

val sqlDF = spark.sql("SELECT count(customerId),customerId,userId FROM GETBYID group by customerId, userId");

或使用first():

val sqlDF = spark.sql("SELECT count(customerId),customerId,first(userId) FROM GETBYID group by customerId");

这篇关于在 spark.sql 中使用 group by 选择多个元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 spark.sql 中使用 group by 选择多个元素 [英] select multiple elements with group by in spark.sql

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 spark.sql 中使用 group by 选择多个元素 [英] select multiple elements with group by in spark.sql

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭