如何在 DataFrame 中用空值替换数字? [英] How can I replace numbers by nulls in a DataFrame?
问题描述
这可能很奇怪,但我想知道如何使用 null
替换任意数量的 DataFrame
的 Column
为 null
>Scala.
It might be strange, but I was wondering how to replace any number of a whole DataFrame
's Column
for null
using Scala
.
想象一下,我有一个名为 col
的可空 DoubleType
列.在那里,我想用 null
替换所有与 (1.0 ~ 10.0) 不同的数字.
Imagine I have a nullable DoubleType
column named col
. There, I want to replace all numbers different to (1.0 ~ 10.0) by a null
.
我对下一个代码的尝试不满意.
I tried unsatisfactorily the next code.
val xf = df.na.replace("col", Map(0.0 -> null.asInstanceOf[Double]).toMap)
但是,正如您在 Scala
中意识到的,当您将 null
转换为 Double
时,它会表示为 0.0代码>,这不是我想要的.此外,我无法意识到用一系列值来做到这一点.因此,我在想是否有任何方法可以实现这一目标?
But, as you realize in Scala
when you convert a null
into a Double
it becomes represented as a 0.0
, and this is not what I want. Besides, I can't realize any way to do it with a range of values. Therefore, I am thinking if there is any way to achieve this?
推荐答案
用 when
子句代替怎么样?
How about when
clause instead?
import org.apache.spark.sql.functions.when
val df = sc.parallelize(
(1L, 0.0) :: (2L, 3.6) :: (3L, 12.0) :: (4L, 5.0) :: Nil
).toDF("id", "val")
df.withColumn("val", when($"val".between(1.0, 10.0), $"val")).show
// +---+----+
// | id| val|
// +---+----+
// | 1|null|
// | 2| 3.6|
// | 3|null|
// | 4| 5.0|
// +---+----+
任何不满足谓词(此处为 val BETWEEN 1.0 AND 10.0
)的值将被替换为 NULL
.
Any value which doesn't satisfy the predicate (here val BETWEEN 1.0 AND 10.0
) will be replaced with NULL
.
这篇关于如何在 DataFrame 中用空值替换数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!