如何将元素追加到Spark Dataframe的数组列? [英] How to append an element to an array column of a Spark Dataframe?

查看:409
本文介绍了如何将元素追加到Spark Dataframe的数组列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我具有以下DataFrame:

Suppose I have the following DataFrame:

scala> val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))
df1: org.apache.spark.sql.DataFrame = [id: string, nums: array<int>]

scala> df1.show()
+---+----+
| id|nums|
+---+----+
|  a| [1]|
|  b| [1]|
+---+----+

我想将元素添加到nums列中的数组中,以便获得如下内容:

And I want to add elements to the array in the nums column, so that I get something like the following:

+---+-------+
| id|nums   |
+---+-------+
|  a| [1,5] |
|  b| [1,5] |
+---+-------+

是否可以使用DataFrame的.withColumn()方法来执行此操作?例如.

Is there a way to do this using the .withColumn() method of the DataFrame? E.g.

val df2 = df1.withColumn("nums", append(col("nums"), lit(5))) 

我已经浏览了Spark的API文档,但是找不到任何可以执行此操作的内容.我可能可以使用splitconcat_ws一起破解某些东西,但是如果可能的话,我希望有一个更优雅的解决方案.谢谢.

I've looked through the API documentation for Spark, but can't find anything that would allow me to do this. I could probably use split and concat_ws to hack something together, but I would prefer a more elegant solution if one is possible. Thanks.

推荐答案

import org.apache.spark.sql.functions.{lit, array, array_union}

val df1 = Seq("a", "b").toDF("id").withColumn("nums", array(lit(1)))
val df2 = df1.withColumn("nums", array_union($"nums", lit(Array(5))))
df2.show

+---+------+
| id|  nums|
+---+------+
|  a|[1, 5]|
|  b|[1, 5]|
+---+------+

array_union()是自2018年1月2日spark 2.4.0发布以来添加的,也就是您提出问题7个月后:)参见

The array_union() was added since spark 2.4.0 release on 11/2/2018, 7 months after you asked the question, :) see https://spark.apache.org/news/index.html

这篇关于如何将元素追加到Spark Dataframe的数组列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆