Spark DataFrame中的条件联接 [英] Conditional Join in Spark DataFrame
问题描述
我正在尝试有条件地加入两个DataFrame
.
I am trying to join two DataFrame
with condition.
我有两个数据框A和B.
I have two dataframe A and B.
A包含id,m_cd和c_cd列 B包含m_cd,c_cd和记录列
A contains id,m_cd and c_cd columns B contains m_cd,c_cd and record columns
条件是-
- 如果m_cd为null,则将A的c_cd与B连接起来
- 如果m_cd不为null,则将A的m_cd与B连接起来
我们可以在数据框的withcolumn
()方法中使用"when
"和"otherwise
()",因此对于联接到数据框的情况,有什么方法可以做到这一点.
we can use "when
" and "otherwise
()" in withcolumn
() method of dataframe, so is there any way to do this for the case of join in dataframe.
我已经使用Union
完成了此操作.但是想知道是否还有其他可用的选项.
I have already done this using Union
.But wanted to know if there any other option available.
推荐答案
您可以在联接条件中使用何时"/否则":
You can use the "when" / "otherwise" in the join condition:
case class Foo(m_cd: Option[Int], c_cd: Option[Int])
val dfA = spark.createDataset(Array(
Foo(Some(1), Some(2)),
Foo(Some(2), Some(3)),
Foo(None: Option[Int], Some(4))
))
val dfB = spark.createDataset(Array(
Foo(Some(1), Some(5)),
Foo(Some(2), Some(6)),
Foo(Some(10), Some(4))
))
val joinCondition = when($"a.m_cd".isNull, $"a.c_cd"===$"b.c_cd")
.otherwise($"a.m_cd"===$"b.m_cd")
dfA.as('a).join(dfB.as('b), joinCondition).show
不过,使用联合会更易读.
It might still be more readable to use the union, though.
这篇关于Spark DataFrame中的条件联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!