联接两个数据帧(其中一个数据帧具有一组引用键)时,数据帧中的引用列引发歧义错误 [英] Reference columns in dataframes throwing ambiguous error when joining two dataframes where one dataframe has an array of reference keys

查看:43
本文介绍了联接两个数据帧(其中一个数据帧具有一组引用键)时,数据帧中的引用列引发歧义错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,如下所示

I have two dataframes as follows

dataframeOne

dataframeOne

+--------------------------------------------
|______subject_______________|______marks___|
| Maths                      |    89        |
| English                    |    90        |
| Religion                   |    80        |
---------------------------------------------

dataframeTwo

dataframeTwo

+-------------------------------------------------------------
|______name__________________|______subject__________________|
| Liza                       |   [Maths]                     |
| Inter                      |   [Religion, English]         |
| Ovin                       |   [Maths, Religion, English]  |
--------------------------------------------------------------

预期产量

+-------------------------------------------------------------
|______name__________________|______marks____________________|
| Liza                       |   [89]                        |
| Inter                      |   [80, 90]                    |
| Religion                   |   [89, 80, 90]                |
--------------------------------------------------------------

要获得以上输出,我需要加入dataframeOne和DataframeTwo.但是在dataframeTwo主题中,Column具有数组,而dataframeone具有字符串值.我尝试了以下代码,并出现了错误

To get the above output I need to join dataframeOne and DataframeTwo. But in dataframeTwo subject Column is having arrays while dataframe one is having a string value. I tried the below code with the error followed by

val newDataframe = dataframeTwo.withColumn("myMarks", struct('marks))
    val studentMarksDataframe = dataframeOne.join(newDataframe, array_contains(subject, subject)).agg(collect_list('myMarks))

错误

线程"main"中的异常;org.apache.spark.sql.AnalysisException:参考"unicode"不明确,可能是:主题,主题

Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'unicode' is ambiguous, could be: subject, subject

我该如何解决以上问题?

How can I solve the above issue?

推荐答案

您可以尝试:

val studentMarksDataframe = dataframeOne.join(
    dataframeTwo, 
    array_contains(dataframeTwo("subject"), dataframeOne("subject"))
).groupBy("name").agg(collect_list('marks))

这篇关于联接两个数据帧(其中一个数据帧具有一组引用键)时,数据帧中的引用列引发歧义错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆