Spark DataDrame中=== null和isNull之间的差异 [英] Difference between === null and isNull in Spark DataDrame
问题描述
当我们使用
df.filter(col("c1") === null) and df.filter(col("c1").isNull)
我正在计数的相同数据框 === null,但isNull中的计数为零.请帮助我了解区别.谢谢
Same dataframe I am getting counts in === null but zero counts in isNull. Please help me to understand the difference. Thanks
推荐答案
首先,除非出于兼容性原因,否则不要在Scala代码中使用null
.
First and foremost don't use null
in your Scala code unless you really have to for compatibility reasons.
关于您的问题,它是纯SQL. col("c1") === null
被解释为c1 = NULL
,并且由于NULL
标记了未定义的值,因此对于包括NULL
本身的任何值,结果都是未定义的.
Regarding your question it is plain SQL. col("c1") === null
is interpreted as c1 = NULL
and, because NULL
marks undefined values, result is undefined for any value including NULL
itself.
spark.sql("SELECT NULL = NULL").show
+-------------+
|(NULL = NULL)|
+-------------+
| null|
+-------------+
spark.sql("SELECT NULL != NULL").show
+-------------------+
|(NOT (NULL = NULL))|
+-------------------+
| null|
+-------------------+
spark.sql("SELECT TRUE != NULL").show
+------------------------------------+
|(NOT (true = CAST(NULL AS BOOLEAN)))|
+------------------------------------+
| null|
+------------------------------------+
spark.sql("SELECT TRUE = NULL").show
+------------------------------+
|(true = CAST(NULL AS BOOLEAN))|
+------------------------------+
| null|
+------------------------------+
检查NULL
的唯一有效方法是:
The only valid methods to check for NULL
are:
-
IS NULL
:
spark.sql("SELECT NULL IS NULL").show
+--------------+
|(NULL IS NULL)|
+--------------+
| true|
+--------------+
spark.sql("SELECT TRUE IS NULL").show
+--------------+
|(true IS NULL)|
+--------------+
| false|
+--------------+
IS NOT NULL
:
spark.sql("SELECT NULL IS NOT NULL").show
+------------------+
|(NULL IS NOT NULL)|
+------------------+
| false|
+------------------+
spark.sql("SELECT TRUE IS NOT NULL").show
+------------------+
|(true IS NOT NULL)|
+------------------+
| true|
+------------------+
在DataFrame
DSL中分别实现为Column.isNull
和Column.isNotNull
.
implemented in DataFrame
DSL as Column.isNull
and Column.isNotNull
respectively.
注意:
对于NULL
安全比较,请使用IS DISTINCT
/IS NOT DISTINCT
:
For NULL
-safe comparisons use IS DISTINCT
/ IS NOT DISTINCT
:
spark.sql("SELECT NULL IS NOT DISTINCT FROM NULL").show
+---------------+
|(NULL <=> NULL)|
+---------------+
| true|
+---------------+
spark.sql("SELECT NULL IS NOT DISTINCT FROM TRUE").show
+--------------------------------+
|(CAST(NULL AS BOOLEAN) <=> true)|
+--------------------------------+
| false|
+--------------------------------+
或not(_ <=> _)
/<=>
spark.sql("SELECT NULL AS col1, NULL AS col2").select($"col1" <=> $"col2").show
+---------------+
|(col1 <=> col2)|
+---------------+
| true|
+---------------+
spark.sql("SELECT NULL AS col1, TRUE AS col2").select($"col1" <=> $"col2").show
+---------------+
|(col1 <=> col2)|
+---------------+
| false|
+---------------+
分别在SQL和DataFrame
DSL中.
in SQL and DataFrame
DSL respectively.
相关:
这篇关于Spark DataDrame中=== null和isNull之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!