如何将嵌套的 Struct 列解包为多列? [英] How to unwrap nested Struct column into multiple columns?
问题描述
我正在尝试将具有嵌套 struct
类型(见下文)的 DataFrame 列扩展为多列.我正在使用的 Struct 架构看起来像 {"foo": 3, "bar": {"baz": 2}}
.
I'm trying to expand a DataFrame column with nested struct
type (see below) to multiple columns. The Struct schema I'm working with looks something like {"foo": 3, "bar": {"baz": 2}}
.
理想情况下,我想将上述内容扩展为两列("foo"
和 "bar.baz"
).但是,当我尝试使用 .select("data.*")
(其中 data
是 Struct 列)时,我只得到列 foo
和 bar
,其中 bar
仍然是一个 struct
.
Ideally, I'd like to expand the above into two columns ("foo"
and "bar.baz"
). However, when I tried using .select("data.*")
(where data
is the Struct column), I only get columns foo
and bar
, where bar
is still a struct
.
有没有办法可以扩展两个层的结构?
Is there a way such that I can expand the Struct for both layers?
推荐答案
您可以选择data.bar.baz
为bar.baz
:
df.show()
+-------+
| data|
+-------+
|[3,[2]]|
+-------+
df.printSchema()
root
|-- data: struct (nullable = false)
| |-- foo: long (nullable = true)
| |-- bar: struct (nullable = false)
| | |-- baz: long (nullable = true)
在 pyspark 中:
In pyspark:
import pyspark.sql.functions as F
df.select(F.col("data.foo").alias("foo"), F.col("data.bar.baz").alias("bar.baz")).show()
+---+-------+
|foo|bar.baz|
+---+-------+
| 3| 2|
+---+-------+
这篇关于如何将嵌套的 Struct 列解包为多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!