在2个数据框列上进行scala-外部联接不会显示存在空值的行 [英] scala- Outer join on 2 dataframe columns doesnt show rows where there are null values

查看:71
本文介绍了在2个数据框列上进行scala-外部联接不会显示存在空值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我这样加入2个数据帧:
val joinCols = Array( first_name, last_name)
val df_subset_joined = df1_subset.as( a)。join(df2_subset.as( b),joinCols, full_outer)
df_subset_joined.show()

Im joining 2 dataframes like so: val joinCols = Array("first_name", "last_name") val df_subset_joined = df1_subset.as("a").join(df2_subset.as("b"), joinCols, "full_outer") df_subset_joined.show()

这是上面代码的结果:

Dataframe of differences between 2 dataframes
+----------+---------+-------------+-------------+
|first_name|last_name|loyalty_score|loyalty_score|
+----------+---------+-------------+-------------+
|     will |    smith|           67|           67|
|   george |  clooney|           67|           67|
|   george |  clooney|           67|           88|
|    blake |   lively|           66|         null|
|    celena|    gomez|         null|            2|
|       eva|    green|           44|           56|
|      null|     null|             |         null|
|     jason|    momoa|           34|           34|
|        ed|  sheeran|           88|         null|
|    lionel|    messi|           88|           88|
|      kyle|   jenner|         null|           56|
|      tom |   cruise|           66|           34|
|      tom |   cruise|           66|           99|
|      brad|     pitt|           99|           78|
|      ryan| reynolds|           45|         null|
+----------+---------+-------------+-------------+

如您所见,存在带有空值的列。

As you can see there are columns with null values.

接下来我运行以下代码:

I run the following code next:

val filter_str = s"a.$col"+" != "+s"b.$col"
val df_subset_filtered = df_subset_joined.filter(filter_str)
df_subset_filtered.show()

我得到了foll数据帧:

I get the foll dataframe:

Below is the dataframe of differences between DF1 and DF1 based on the comparison between:
a.loyalty_score != b.loyalty_score
+----------+---------+-------------+-------------+
|first_name|last_name|loyalty_score|loyalty_score|
+----------+---------+-------------+-------------+
|      tom |   cruise|           66|           99|
|      tom |   cruise|           66|           34|
|       eva|    green|           44|           56|
|      brad|     pitt|           99|           78|
|   george |  clooney|           67|           88|
+----------+---------+-------------+-------------+

为什么我看不到在1列中有空值和实际值的行值另一个。不应该满足这个值!= null

Why dont I see the rows where there are null values in 1 column and a actual value in another. Shouldnt this satisfy value != null

如何使我的过滤器语句使空值出现在最终数据框中

How can I make my filter statement make the null values appear in the final dataframe

推荐答案

一列中没有 null 而没有 non的任何行的原因-null 另一个是比较返回 FALSE

The reason you don't get any rows where there is null in one column and non-null in the other is that the comparison returns FALSE.

要避免这种情况,请使用空安全比较运算符 < => ,与 not 结合使用。

To avoid this, use the null-safe comparison operator <=>, in conjunction with not.

val filter_str = "not(" + s"a.$col"+" <=> "+s"b.$col)" 
val df_subset_filtered = df_subset_joined.filter(filter_str)
df_subset_filtered.show()

从文档中


expr1< => expr2-对于非空操作数,返回与EQUAL(=)运算符相同的结果,但如果两者均为null,则返回true一世f其中一个为空。

expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.

参数:

expr1,expr2-两个表达式必须相同type或可以强制转换为通用类型,并且必须是可在相等比较中使用的类型。不支持地图类型。对于诸如数组/结构之类的复杂类型,字段的数据类型必须是可排序的。
示例:

expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. Map type is not supported. For complex types such array/struct, the data types of fields must be orderable. Examples:

SELECT 2< => 2;
true

SELECT 2 <=> 2; true

选择1< =>‘1’;
true

SELECT 1 <=> '1'; true

SELECT true< => NULL;
false

SELECT true <=> NULL; false

选择NULL< => NULL;
true

SELECT NULL <=> NULL; true

这篇关于在2个数据框列上进行scala-外部联接不会显示存在空值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆