如何将嵌套的 Struct 列解包为多列? [英] How to unwrap nested Struct column into multiple columns?

查看:24
本文介绍了如何将嵌套的 Struct 列解包为多列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将具有嵌套 struct 类型(见下文)的 DataFrame 列扩展为多列.我正在使用的 Struct 架构看起来像 {"foo": 3, "bar": {"baz": 2}}.

I'm trying to expand a DataFrame column with nested struct type (see below) to multiple columns. The Struct schema I'm working with looks something like {"foo": 3, "bar": {"baz": 2}}.

理想情况下,我想将上述内容扩展为两列("foo""bar.baz").但是,当我尝试使用 .select("data.*") (其中 data 是 Struct 列)时,我只得到列 foobar,其中 bar 仍然是一个 struct.

Ideally, I'd like to expand the above into two columns ("foo" and "bar.baz"). However, when I tried using .select("data.*") (where data is the Struct column), I only get columns foo and bar, where bar is still a struct.

有没有办法可以扩展两个层的结构?

Is there a way such that I can expand the Struct for both layers?

推荐答案

您可以选择data.bar.bazbar.baz:

df.show()
+-------+
|   data|
+-------+
|[3,[2]]|
+-------+

df.printSchema()
root
 |-- data: struct (nullable = false)
 |    |-- foo: long (nullable = true)
 |    |-- bar: struct (nullable = false)
 |    |    |-- baz: long (nullable = true)

在 pyspark 中:

In pyspark:

import pyspark.sql.functions as F
df.select(F.col("data.foo").alias("foo"), F.col("data.bar.baz").alias("bar.baz")).show()
+---+-------+
|foo|bar.baz|
+---+-------+
|  3|      2|
+---+-------+

这篇关于如何将嵌套的 Struct 列解包为多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆