Spark DataDrame 中 === null 和 isNull 的区别 [英] Difference between === null and isNull in Spark DataDrame

查看:26
本文介绍了Spark DataDrame 中 === null 和 isNull 的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用时的差异感到有些困惑

I am bit confused with the difference when we are using

 df.filter(col("c1") === null) and df.filter(col("c1").isNull) 

我正在计算的相同数据帧=== null 但在 isNull 中为零计数.请帮助我理解其中的区别.谢谢

Same dataframe I am getting counts in === null but zero counts in isNull. Please help me to understand the difference. Thanks

推荐答案

首先不要在 Scala 代码中使用 null ,除非出于兼容性原因确实必须这样做.

First and foremost don't use null in your Scala code unless you really have to for compatibility reasons.

关于您的问题,它是简单的 SQL.col("c1") === null 被解释为 c1 = NULL 并且,因为 NULL 标记未定义的值,结果对于任何值包括 NULL 本身.

Regarding your question it is plain SQL. col("c1") === null is interpreted as c1 = NULL and, because NULL marks undefined values, result is undefined for any value including NULL itself.

spark.sql("SELECT NULL = NULL").show

+-------------+
|(NULL = NULL)|
+-------------+
|         null|
+-------------+

spark.sql("SELECT NULL != NULL").show

+-------------------+
|(NOT (NULL = NULL))|
+-------------------+
|               null|
+-------------------+

spark.sql("SELECT TRUE != NULL").show

+------------------------------------+
|(NOT (true = CAST(NULL AS BOOLEAN)))|
+------------------------------------+
|                                null|
+------------------------------------+

spark.sql("SELECT TRUE = NULL").show

+------------------------------+
|(true = CAST(NULL AS BOOLEAN))|
+------------------------------+
|                          null|
+------------------------------+

检查 NULL 的唯一有效方法是:

The only valid methods to check for NULL are:

  • IS NULL:

spark.sql("SELECT NULL IS NULL").show

+--------------+
|(NULL IS NULL)|
+--------------+
|          true|
+--------------+

spark.sql("SELECT TRUE IS NULL").show

+--------------+
|(true IS NULL)|
+--------------+
|         false|
+--------------+

  • 不为空:

    spark.sql("SELECT NULL IS NOT NULL").show
    

    +------------------+
    |(NULL IS NOT NULL)|
    +------------------+
    |             false|
    +------------------+
    

    spark.sql("SELECT TRUE IS NOT NULL").show
    

    +------------------+
    |(true IS NOT NULL)|
    +------------------+
    |              true|
    +------------------+
    

  • DataFrame DSL 中分别实现为 Column.isNullColumn.isNotNull.

    implemented in DataFrame DSL as Column.isNull and Column.isNotNull respectively.

    注意:

    对于 NULL-safe 比较使用 IS DISTINCT/IS NOT DISTINCT:

    For NULL-safe comparisons use IS DISTINCT / IS NOT DISTINCT:

    spark.sql("SELECT NULL IS NOT DISTINCT FROM NULL").show
    

    +---------------+
    |(NULL <=> NULL)|
    +---------------+
    |           true|
    +---------------+
    

    spark.sql("SELECT NULL IS NOT DISTINCT FROM TRUE").show
    

    +--------------------------------+
    |(CAST(NULL AS BOOLEAN) <=> true)|
    +--------------------------------+
    |                           false|
    +--------------------------------+
    

    not(_ <=> _)/<=>

    spark.sql("SELECT NULL AS col1, NULL AS col2").select($"col1" <=> $"col2").show
    

    +---------------+
    |(col1 <=> col2)|
    +---------------+
    |           true|
    +---------------+
    

    spark.sql("SELECT NULL AS col1, TRUE AS col2").select($"col1" <=> $"col2").show
    

    +---------------+
    |(col1 <=> col2)|
    +---------------+
    |          false|
    +---------------+
    

    分别在 SQL 和 DataFrame DSL 中.

    in SQL and DataFrame DSL respectively.

    相关:

    在 Apache Spark Join 中包含空值

    这篇关于Spark DataDrame 中 === null 和 isNull 的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆