na。填写Spark DataFrame Scala [英] na.fill in Spark DataFrame Scala

查看：109 发布时间：2020/10/17 0:28:19 scala apache-spark dataframe

本文介绍了na。填写Spark DataFrame Scala的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Spark / Scala，我想根据列的类型使用默认值填充DataFrame中的空值。

I am using Spark/Scala and I want to fill the nulls in my DataFrame with default values based on the type of the columns.

即字符串列-> string，数字列-> 111，布尔列-> False等。

i.e String Columns -> "string", Numeric Columns -> 111, Boolean Columns -> False etc.

当前DF.na.functions API提供了na.fill

fill（valueMap：Map [String，Any]）

Currently the DF.na.functions API provides na.fill
fill(valueMap: Map[String, Any]) like

df.na.fill(Map(
    "A" -> "unknown",
    "B" -> 1.0
))

这需要知道列名以及列的类型。

This requires knowing the column names and also the type of the columns.

fill(value: String, cols: Seq[String])

这只是字符串/双精度类型，甚至不是布尔型。

This is only String/Double types, not even Boolean.

是否有一种聪明的方法？

Is there a smart way to do this?

推荐答案

看看 dtypes：Array [（String，Stri ng）] 。您可以使用此方法的输出为填充生成 Map ，例如：

val typeMap = df.dtypes.map(column => 
    column._2 match {
        case "IntegerType" => (column._1 -> 0)
        case "StringType" => (column._1 -> "")
        case "DoubleType" => (column._1 -> 0.0)
    }).toMap

这篇关于na。填写Spark DataFrame Scala的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

na。填写Spark DataFrame Scala [英] na.fill in Spark DataFrame Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

na。填写Spark DataFrame Scala [英] na.fill in Spark DataFrame Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭