联接两个数据帧(其中一个数据帧具有一组引用键)时,数据帧中的引用列引发歧义错误 [英] Reference columns in dataframes throwing ambiguous error when joining two dataframes where one dataframe has an array of reference keys
问题描述
我有两个数据框,如下所示
I have two dataframes as follows
dataframeOne
dataframeOne
+--------------------------------------------
|______subject_______________|______marks___|
| Maths | 89 |
| English | 90 |
| Religion | 80 |
---------------------------------------------
dataframeTwo
dataframeTwo
+-------------------------------------------------------------
|______name__________________|______subject__________________|
| Liza | [Maths] |
| Inter | [Religion, English] |
| Ovin | [Maths, Religion, English] |
--------------------------------------------------------------
预期产量
+-------------------------------------------------------------
|______name__________________|______marks____________________|
| Liza | [89] |
| Inter | [80, 90] |
| Religion | [89, 80, 90] |
--------------------------------------------------------------
要获得以上输出,我需要加入dataframeOne和DataframeTwo.但是在dataframeTwo主题中,Column具有数组,而dataframeone具有字符串值.我尝试了以下代码,并出现了错误
To get the above output I need to join dataframeOne and DataframeTwo. But in dataframeTwo subject Column is having arrays while dataframe one is having a string value. I tried the below code with the error followed by
val newDataframe = dataframeTwo.withColumn("myMarks", struct('marks))
val studentMarksDataframe = dataframeOne.join(newDataframe, array_contains(subject, subject)).agg(collect_list('myMarks))
错误
线程"main"中的异常;org.apache.spark.sql.AnalysisException:参考"unicode"不明确,可能是:主题,主题
Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'unicode' is ambiguous, could be: subject, subject
我该如何解决以上问题?
How can I solve the above issue?
推荐答案
您可以尝试:
val studentMarksDataframe = dataframeOne.join(
dataframeTwo,
array_contains(dataframeTwo("subject"), dataframeOne("subject"))
).groupBy("name").agg(collect_list('marks))
这篇关于联接两个数据帧(其中一个数据帧具有一组引用键)时,数据帧中的引用列引发歧义错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!