如何将嵌套的Struct列展开为多列? [英] How to unwrap nested Struct column into multiple columns?
问题描述
我正在尝试将嵌套struct
类型的DataFrame列(请参见下文)扩展为多列.我正在使用的Struct模式看起来像{"foo": 3, "bar": {"baz": 2}}
.
I'm trying to expand a DataFrame column with nested struct
type (see below) to multiple columns. The Struct schema I'm working with looks something like {"foo": 3, "bar": {"baz": 2}}
.
理想情况下,我想将以上内容扩展为两列("foo"
和"bar.baz"
).但是,当我尝试使用.select("data.*")
(其中data
是Struct列)时,我只得到列foo
和bar
,其中bar
仍然是struct
.
Ideally, I'd like to expand the above into two columns ("foo"
and "bar.baz"
). However, when I tried using .select("data.*")
(where data
is the Struct column), I only get columns foo
and bar
, where bar
is still a struct
.
有没有一种方法可以扩展Struct的两层?
Is there a way such that I can expand the Struct for both layers?
推荐答案
您可以选择data.bar.baz
作为bar.baz
:
df.show()
+-------+
| data|
+-------+
|[3,[2]]|
+-------+
df.printSchema()
root
|-- data: struct (nullable = false)
| |-- foo: long (nullable = true)
| |-- bar: struct (nullable = false)
| | |-- baz: long (nullable = true)
在pyspark中:
import pyspark.sql.functions as F
df.select(F.col("data.foo").alias("foo"), F.col("data.bar.baz").alias("bar.baz")).show()
+---+-------+
|foo|bar.baz|
+---+-------+
| 3| 2|
+---+-------+
这篇关于如何将嵌套的Struct列展开为多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!