如何在 Spark SQL(DataFrame) 的 UDF 中使用常量值 [英] How to use constant value in UDF of Spark SQL(DataFrame)

查看：49 发布时间：2021/11/14 21:53:20 scala apache-spark apache-spark-sql

本文介绍了如何在 Spark SQL(DataFrame) 的 UDF 中使用常量值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含 timestamp 的数据框.要按时间(分钟、小时或天)聚合，我尝试过:

I have a dataframe which includes timestamp. To aggregate by time(minute, hour, or day), I have tried as:

val toSegment = udf((timestamp: String) => {
  val asLong = timestamp.toLong
  asLong - asLong % 3600000 // period = 1 hour
})

val df: DataFrame // the dataframe
df.groupBy(toSegment($"timestamp")).count()

这很好用.

我的问题是如何将 UDF toSegment 概括为

My question is how to generalize the UDF toSegment as

val toSegmentGeneralized = udf((timestamp: String, period: Int) => {
  val asLong = timestamp.toLong
  asLong - asLong % period
})

我试过如下，但没有用

df.groupBy(toSegment($"timestamp", $"3600000")).count()

似乎找到名为3600000的列.

可能的解决方案是使用常量列，但我找不到.

Possible solution is to use constant column but I couldn't find it.

推荐答案

您可以使用 org.apache.spark.sql.functions.lit() 创建常量列:

You can use org.apache.spark.sql.functions.lit() to create the constant column:

import org.apache.spark.sql.functions._

df.groupBy(toSegment($"timestamp", lit(3600000))).count()

这篇关于如何在 Spark SQL(DataFrame) 的 UDF 中使用常量值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 Spark SQL(DataFrame) 的 UDF 中使用常量值 [英] How to use constant value in UDF of Spark SQL(DataFrame)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 Spark SQL(DataFrame) 的 UDF 中使用常量值 [英] How to use constant value in UDF of Spark SQL(DataFrame)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭