如何用应替换的空值(用0)来最大化每列? [英] How to max per column with nulls that should be replaced (with 0)?

查看:66
本文介绍了如何用应替换的空值(用0)来最大化每列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在下面的数据框中获得MAX?

How to get the MAX in the below dataframe?

val df_n = df.select($"ID").filter(($"READ") === "" && ($"ACT"!==""))

我必须找出ID的最大值,如果IDNULL,我必须将其替换为0.

I have to find out the MAX of ID and in case if ID is NULL, I have to replace it with 0.

推荐答案

以下内容如何?

scala> val df = Seq("0", null, "5", null, null, "-8").toDF("id")
df: org.apache.spark.sql.DataFrame = [id: string]

scala> df.printSchema
root
 |-- id: string (nullable = true)

scala> df.withColumn("idAsLong", $"id" cast "long").printSchema
root
 |-- id: string (nullable = true)
 |-- idAsLong: long (nullable = true)


scala> val testDF = df.withColumn("idAsLong", $"id" cast "long")
testDF: org.apache.spark.sql.DataFrame = [id: string, idAsLong: bigint]

scala> testDF.show
+----+--------+
|  id|idAsLong|
+----+--------+
|   0|       0|
|null|    null|
|   5|       5|
|null|    null|
|null|    null|
|  -8|      -8|
+----+--------+

解决方案

scala> testDF.agg(max("idAsLong")).show
+-------------+
|max(idAsLong)|
+-------------+
|            5|
+-------------+

使用na运算符

如果只有负值和null,那么null是最大值,该怎么办?在Dataset上使用na运算符.

Using na Operator

What if you had only negative values and null and so null is the maximum value? Use na operator on Dataset.

val withNulls = Seq("-1", "-5", null, null, "-333", null)
  .toDF("id")
  .withColumn("asInt", $"id" cast "int")  // <-- column of type int with nulls

scala> withNulls.na.fill(Map("asInt" -> 0)).agg(max("asInt")).show
+----------+
|max(asInt)|
+----------+
|         0|
+----------+

如果没有na而没有替换null,将无法正常工作.

Without na and replacing null it simply won't work.

scala> withNulls.agg(max("asInt")).show
+----------+
|max(asInt)|
+----------+
|        -1|
+----------+

请参见 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆