如何使用 usingColumns 在 spark 中加入嵌套列 [英] How to join nested columns in spark with usingColumns

查看：14 发布时间：2021/11/14 22:46:13 apache-spark join apache-spark-sql

本文介绍了如何使用 usingColumns 在 spark 中加入嵌套列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有 2 个数据框想要加入.

I have 2 dataframes that I would like to join.

DF1:

root
 |-- myStruct: struct (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- region: long (nullable = true)
 |-- first_name: string (nullable = true)

DF2:

root
 |-- id: string (nullable = true)
 |-- region: long (nullable = true)
 |-- second_name: string (nullable = true)

我的加入声明是

df1.join(df2, Seq("id", "region"), "leftouter")

但是失败了

USING column `id` cannot be resolved on the left side of the join. The left-side columns: myStruct, first_name

我在 Scala 上运行 Spark 2.2

I am running Spark 2.2 on Scala

推荐答案

您可以使用 . 表示法从 struct 列中选择一个元素.所以要从 df1 中选择 id 你必须做 myStruct.id 并选择 region 你必须使用myStruct.region.

You can use . notation to select an element from struct column. so to select id from df1 you will have to do myStruct.id and to select region you have to use myStruct.region.

并且由于要使用的列名不相同您可以使用===表示法进行比较

And since the column names to be used are not same you can use === notation for comparison as

df1.join(df2, df1("myStruct.id") === df2("id") && df1("myStruct.region") === df2("region"), "leftouter")

您应该使用以下 schema

root
 |-- myStruct: struct (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- region: long (nullable = false)
 |-- first_name: string (nullable = true)
 |-- id: string (nullable = true)
 |-- region: integer (nullable = true)
 |-- second_name: string (nullable = true)

您可以删除加入后不需要的列或选择加入后只需要的列

You can drop the unnecessary columns after join or select only needed columns after join

希望回答对你有帮助

这篇关于如何使用 usingColumns 在 spark 中加入嵌套列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 usingColumns 在 spark 中加入嵌套列 [英] How to join nested columns in spark with usingColumns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用 usingColumns 在 spark 中加入嵌套列 [英] How to join nested columns in spark with usingColumns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭