如何在 DataFrame 中用空值替换数字? [英] How can I replace numbers by nulls in a DataFrame?

查看:91
本文介绍了如何在 DataFrame 中用空值替换数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能很奇怪,但我想知道如何使用 null 替换任意数量的 DataFrameColumnnull>Scala.

It might be strange, but I was wondering how to replace any number of a whole DataFrame's Column for null using Scala.

想象一下,我有一个名为 col 的可空 DoubleType 列.在那里,我想用 null 替换所有与 (1.0 ~ 10.0) 不同的数字.

Imagine I have a nullable DoubleType column named col. There, I want to replace all numbers different to (1.0 ~ 10.0) by a null.

我对下一个代码的尝试不满意.

I tried unsatisfactorily the next code.

val xf = df.na.replace("col", Map(0.0 -> null.asInstanceOf[Double]).toMap)

但是,正如您在 Scala 中意识到的,当您将 null 转换为 Double 时,它会表示为 0.0,这不是我想要的.此外,我无法意识到用一系列值来做到这一点.因此,我在想是否有任何方法可以实现这一目标?

But, as you realize in Scala when you convert a null into a Double it becomes represented as a 0.0, and this is not what I want. Besides, I can't realize any way to do it with a range of values. Therefore, I am thinking if there is any way to achieve this?

推荐答案

when 子句代替怎么样?

How about when clause instead?

import org.apache.spark.sql.functions.when

val df = sc.parallelize(
  (1L, 0.0) :: (2L, 3.6) :: (3L, 12.0) :: (4L, 5.0) ::  Nil
).toDF("id", "val")

df.withColumn("val", when($"val".between(1.0, 10.0), $"val")).show

// +---+----+
// | id| val|
// +---+----+
// |  1|null|
// |  2| 3.6|
// |  3|null|
// |  4| 5.0|
// +---+----+

任何不满足谓词(此处为 val BETWEEN 1.0 AND 10.0)的值将被替换为 NULL.

Any value which doesn't satisfy the predicate (here val BETWEEN 1.0 AND 10.0) will be replaced with NULL.

另见创建具有空/空字段值的新数据框

这篇关于如何在 DataFrame 中用空值替换数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆