如何从Pyspark中的数组中提取元素 [英] How to extract an element from a array in pyspark

查看：998 发布时间：2020/9/4 0:01:07 python apache-spark pyspark rdd

本文介绍了如何从Pyspark中的数组中提取元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下类型的数据框

col1|col2|col3|col4
xxxx|yyyy|zzzz|[1111],[2222]

我希望我的输出遵循以下类型

I want my output to be following type

col1|col2|col3|col4|col5
xxxx|yyyy|zzzz|1111|2222

我的col4是一个数组，我想将其转换为单独的列.需要做什么?

My col4 is an array and I want to convert it to a separate column. What needs to be done?

我看到了很多关于flatmap的答案，但是它们正在增加一行，我只想将元组放在另一列中但在同一行中

I saw many answers with flatmap but they are increasing a row, I want just the tuple to be put in another column but in the same row

以下是我的实际架构:

root
 |-- PRIVATE_IP: string (nullable = true)
 |-- PRIVATE_PORT: integer (nullable = true)
 |-- DESTINATION_IP: string (nullable = true)
 |-- DESTINATION_PORT: integer (nullable = true)
 |-- collect_set(TIMESTAMP): array (nullable = true)
 |    |-- element: string (containsNull = true)

也可以请一些人帮助我解释一下数据帧和RDD.

Also can please some one help me with explanation on both dataframes and RDD's

推荐答案

创建示例数据:

from pyspark.sql import Row
x = [Row(col1="xx", col2="yy", col3="zz", col4=[123,234])]
rdd = sc.parallelize([Row(col1="xx", col2="yy", col3="zz", col4=[123,234])])
df = spark.createDataFrame(rdd)
df.show()
#+----+----+----+----------+
#|col1|col2|col3|      col4|
#+----+----+----+----------+
#|  xx|  yy|  zz|[123, 234]|
#+----+----+----+----------+

使用getItem这样从数组列中提取元素，在实际情况下，将col4替换为collect_set(TIMESTAMP):

Use getItem to extract element from the array column as this, in your actual case replace col4 with collect_set(TIMESTAMP):

df = df.withColumn("col5", df["col4"].getItem(1)).withColumn("col4", df["col4"].getItem(0))
df.show()
#+----+----+----+----+----+
#|col1|col2|col3|col4|col5|
#+----+----+----+----+----+
#|  xx|  yy|  zz| 123| 234|
#+----+----+----+----+----+

这篇关于如何从Pyspark中的数组中提取元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从Pyspark中的数组中提取元素 [英] How to extract an element from a array in pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从Pyspark中的数组中提取元素 [英] How to extract an element from a array in pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭