根据Scala Spark中的键在数据框中合并多个记录 [英] Merge Multiple Records in a Dataframe based on a key in scala spark

查看：96 发布时间：2020/9/4 5:18:19 scala apache-spark dataframe

本文介绍了根据Scala Spark中的键在数据框中合并多个记录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，其中包含由键标识的记录.但是在某些情况下，密钥可能会重复.我的目标是按以下步骤合并基于该键的所有记录

I have a dataframe which contains records identified by a key. But there might be a case where a key can get repetitive. My goal is to merge all the records based on that key as follows

让我们假设我的输入数据帧看起来像这样:

Lets suppose my input dataframe looks something like this:

key | value1 | value2 | value3
-------------------------------
a   | 1      | null   | null
a   | null   | 2      | null
a   | null   | null   | 3

并且我希望基于'a'合并后的输出如下所示

and I want my output after merging based on 'a' should look something like as follows

key | value1 | value2 | value3
-------------------------------
a   | 1      | 2      | 3

现在我可以确定这部分的三个值中的哪一个将与键"a"的一条记录相对应.

Now I am sure about this part either one the three values will be present against one record for the key 'a'.

谢谢

推荐答案

如果您知道组中只有一条记录不为空(或者您不在乎会得到哪一条)，则可以使用first:

If you know there is only one record for group which is not null (or you don't care which one you'll get), you can use first:

import org.apache.spark.sql.functions.{first, last}

val df = Seq(
  ("a", Some(1), None, None), ("a", None, Some(2), None),
  ("a", None, None, Some(3))
).toDF("key", "value1", "value2", "value3")

df.groupBy("key").agg(
  first("value1", true) as "value1", 
  first("value2", true) as "value2", 
  first("value3", true) as "value3"
).show  

// +---+------+------+------+
// |key|value1|value2|value3|
// +---+------+------+------+
// |  a|     1|     2|     3|
// +---+------+------+------+

或last:

df.groupBy("key").agg(
  last("value1", true) as "value1", 
  last("value2", true) as "value2", 
  last("value3", true) as "value3"
).show  


// +---+------+------+------+
// |key|value1|value2|value3|
// +---+------+------+------+
// |  a|     1|     2|     3|
// +---+------+------+------+

这篇关于根据Scala Spark中的键在数据框中合并多个记录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据Scala Spark中的键在数据框中合并多个记录 [英] Merge Multiple Records in a Dataframe based on a key in scala spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据Scala Spark中的键在数据框中合并多个记录 [英] Merge Multiple Records in a Dataframe based on a key in scala spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭