Scala 中的行聚合 [英] Row aggregations in Scala

查看:26
本文介绍了Scala 中的行聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种在 Scala 的数据框中获取新列的方法,该列计算 col1 中值的 min/maxcode>, col2, ..., col10 每行.

I am looking for a way to get a new column in a data frame in Scala that calculates the min/max of the values in col1, col2, ..., col10 for each row.

我知道我可以用 UDF 来做到这一点,但也许有更简单的方法.

I know I can do it with a UDF but maybe there is an easier way.

谢谢!

推荐答案

Porting this Python answer by user6910411

Porting this Python answer by user6910411

import org.apache.spark.sql.functions._

val df = Seq(
  (1, 3, 0, 9, "a", "b", "c")
).toDF("col1", "col2", "col3", "col4", "col5", "col6", "Col7")

val cols =  Seq("col1", "col2", "col3", "col4")

val rowMax = greatest(
  cols map col: _*
).alias("max")

val rowMin = least(
  cols map col: _*
).alias("min")

df.select($"*", rowMin, rowMax).show

// +----+----+----+----+----+----+----+---+---+
// |col1|col2|col3|col4|col5|col6|Col7|min|max|
// +----+----+----+----+----+----+----+---+---+
// |   1|   3|   0|   9|   a|   b|   c|0.0|9.0|
// +----+----+----+----+----+----+----+---+---+

这篇关于Scala 中的行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆