如何在Spark 1.6 UDF中过滤可为空的数组元素 [英] How to filter nullable Array-Elements in Spark 1.6 UDF

查看：234 发布时间：2020/9/4 5:18:15 scala apache-spark apache-spark-sql

本文介绍了如何在Spark 1.6 UDF中过滤可为空的数组元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑以下DataFrame

Consider the following DataFrame

root
 |-- values: array (nullable = true)
 |    |-- element: double (containsNull = true)

内容:

+-----------+
|     values|
+-----------+
|[1.0, null]|
+-----------+

现在我想将value列传递给UDF:

Now I want to pass thie value column to an UDF:

val inspect = udf((data:Seq[Double]) => {
  data.foreach(println)
  println()
  data.foreach(d => println(d))
  println()
  data.foreach(d => println(d==null))
  ""
})

df.withColumn("dummy",inspect($"values"))

我真的对上面的println语句的输出感到困惑:

I'm really confused from the output of the above println statements:

1.0
null

1.0
0.0

false
false

我的问题:

为什么foreach(println)不能提供与foreach(d=>println(d))相同的输出?
在第一个println语句中Double如何为空，我以为scala的Double不能为空?
如何在我的Seq其他韩式过滤0.0中过滤空值，这不是很安全?我应该在UDF中使用Seq[java.lang.Double]作为输入的类型，然后过滤null吗? (这可行，但是我不确定这是否可行)

Why is foreach(println) not giving the same output as foreach(d=>println(d))?
How can the Double be null in the first println-statement, I thought scala's Double cannot be null?
How can I filter null values in my Seq other han filtering 0.0 which isnt really safe? Should I use Seq[java.lang.Double] as type for my input in the UDF and then filter nulls? (this works, but I'm unsure if that is the way to go)

请注意，我知道这个问题，但是我的问题特定于数组类型的列.

Note that I'm aware of this Question, but my question is specific to array-type columns.

如何在Spark 1.6 UDF中过滤可为空的数组元素 [英] How to filter nullable Array-Elements in Spark 1.6 UDF

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Spark 1.6 UDF中过滤可为空的数组元素 [英] How to filter nullable Array-Elements in Spark 1.6 UDF

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭