Apache Spark如何将列表/数组中的新列追加到Spark数据框 [英] Apache Spark how to append new column from list/array to Spark dataframe

查看：217 发布时间：2020/9/3 23:40:31 scala apache-spark dataframe apache-spark-sql

本文介绍了Apache Spark如何将列表/数组中的新列追加到Spark数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Apache Spark 2.0 Dataframe/Dataset API 我想从值列表中向数据框添加新列.我的列表具有相同数量的值，如给定的数据框.

I am using Apache Spark 2.0 Dataframe/Dataset API I want to add a new column to my dataframe from List of values. My list has same number of values like given dataframe.

val list = List(4,5,10,7,2)
val df   = List("a","b","c","d","e").toDF("row1")

我想做类似的事情:

val appendedDF = df.withColumn("row2",somefunc(list))
df.show()
// +----+------+
// |row1 |row2 |
// +----+------+
// |a    |4    |
// |b    |5    |
// |c    |10   |
// |d    |7    |
// |e    |2    |
// +----+------+

对于任何想法，我都会很高兴，实际上我的数据框包含更多列.

For any ideas I would be greatful, my dataframe in reality contains more columns.

推荐答案

您可以这样做:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._    

// create rdd from the list
val rdd = sc.parallelize(List(4,5,10,7,2))
// rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[31] at parallelize at <console>:28

// zip the data frame with rdd
val rdd_new = df.rdd.zip(rdd).map(r => Row.fromSeq(r._1.toSeq ++ Seq(r._2)))
// rdd_new: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[33] at map at <console>:32

// create a new data frame from the rdd_new with modified schema
spark.createDataFrame(rdd_new, df.schema.add("new_col", IntegerType)).show
+----+-------+
|row1|new_col|
+----+-------+
|   a|      4|
|   b|      5|
|   c|     10|
|   d|      7|
|   e|      2|
+----+-------+

这篇关于Apache Spark如何将列表/数组中的新列追加到Spark数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Apache Spark如何将列表/数组中的新列追加到Spark数据框 [英] Apache Spark how to append new column from list/array to Spark dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Apache Spark如何将列表/数组中的新列追加到Spark数据框 [英] Apache Spark how to append new column from list/array to Spark dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭