根据不同的列类型联接两个表 [英] Join two tables based on different column type

查看：135 发布时间：2019/9/19 16:53:32 sql scala dataframe join hive

本文介绍了根据不同的列类型联接两个表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个具有以下模式的表作为示例:

I have two tables with the following schema as example:

   scala> df1.printSchema
   root
    |-- id: string (nullable = true)

   AND

   scala> df2.printSchema
   root
    |-- col1: string (nullable = true)
    |-- col2: array (nullable = true)
    |    |-- element: string (containsNull = true)

我想在df2中获取所有col1，其中col2数组中的元素等于df1中的id.诸如df3之类的内容输出:

I want to get all col1 in df2 where an element in col2 array is equal to id in df1. Something such as df3 is output:

   scala> df3.printSchema
   root
    |-- c1: array (nullable = true)
    |    |-- element: string (containsNull = true)
    |-- c2: string (nullable = true)

其中df3.c2基本上是df1.id，而df3.c1是满足上述相等性的所有df2.col1的数组.

where df3.c2 is basically df1.id and df3.c1 is array of all df2.col1 that satisfy the mentioned equality.

任何SQL(配置单元)或Scala解决方案都非常有用.

any SQL (hive) or Scala solution is very helpful.

推荐答案

在Hive中:

select collect_set(df2.col1) as col1, df1.id as col2
 from df1
inner join 
(
select --explode col2 array 
      col1, s.c2 as col2 
 from df2 lateral view explode(col2) s as c2 
) df2 on df1.id = df2.col2
group by df1.id;

这篇关于根据不同的列类型联接两个表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据不同的列类型联接两个表 [英] Join two tables based on different column type

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据不同的列类型联接两个表 [英] Join two tables based on different column type

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭