如何使用usingColumns在Spark中连接嵌套列 [英] How to join nested columns in spark with usingColumns

查看：677 发布时间：2019/9/19 16:30:59 apache-spark join apache-spark-sql

本文介绍了如何使用usingColumns在Spark中连接嵌套列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想加入2个数据框.

I have 2 dataframes that I would like to join.

DF1:

root
 |-- myStruct: struct (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- region: long (nullable = true)
 |-- first_name: string (nullable = true)

DF2:

root
 |-- id: string (nullable = true)
 |-- region: long (nullable = true)
 |-- second_name: string (nullable = true)

我的加入声明是

df1.join(df2, Seq("id", "region"), "leftouter")

但是以

USING column `id` cannot be resolved on the left side of the join. The left-side columns: myStruct, first_name

我正在Scala上运行Spark 2.2

I am running Spark 2.2 on Scala

推荐答案

您可以使用.表示法从struct列中选择一个元素.因此要从 df1 中选择id，您将必须执行myStruct.id，而要选择region，则必须使用myStruct.region.

You can use . notation to select an element from struct column. so to select id from df1 you will have to do myStruct.id and to select region you have to use myStruct.region.

和由于要使用的列名不同，您可以使用===表示法进行比较

And since the column names to be used are not same you can use === notation for comparison as

df1.join(df2, df1("myStruct.id") === df2("id") && df1("myStruct.region") === df2("region"), "leftouter")

您应该将连接的 dataframe 与以下 schema

You should have the joined dataframe with following schema

root
 |-- myStruct: struct (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- region: long (nullable = false)
 |-- first_name: string (nullable = true)
 |-- id: string (nullable = true)
 |-- region: integer (nullable = true)
 |-- second_name: string (nullable = true)

您可以在加入后删除不必要的列，或者在加入后选择仅需要的列

You can drop the unnecessary columns after join or select only needed columns after join

我希望答案会有所帮助

这篇关于如何使用usingColumns在Spark中连接嵌套列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用usingColumns在Spark中连接嵌套列 [英] How to join nested columns in spark with usingColumns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用usingColumns在Spark中连接嵌套列 [英] How to join nested columns in spark with usingColumns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭