如何用应替换的空值(用0)来最大化每列? [英] How to max per column with nulls that should be replaced (with 0)?
本文介绍了如何用应替换的空值(用0)来最大化每列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何在下面的数据框中获得MAX?
How to get the MAX in the below dataframe?
val df_n = df.select($"ID").filter(($"READ") === "" && ($"ACT"!==""))
我必须找出ID
的最大值,如果ID
是NULL
,我必须将其替换为0.
I have to find out the MAX of ID
and in case if ID
is NULL
, I have to replace it with 0.
推荐答案
以下内容如何?
scala> val df = Seq("0", null, "5", null, null, "-8").toDF("id")
df: org.apache.spark.sql.DataFrame = [id: string]
scala> df.printSchema
root
|-- id: string (nullable = true)
scala> df.withColumn("idAsLong", $"id" cast "long").printSchema
root
|-- id: string (nullable = true)
|-- idAsLong: long (nullable = true)
scala> val testDF = df.withColumn("idAsLong", $"id" cast "long")
testDF: org.apache.spark.sql.DataFrame = [id: string, idAsLong: bigint]
scala> testDF.show
+----+--------+
| id|idAsLong|
+----+--------+
| 0| 0|
|null| null|
| 5| 5|
|null| null|
|null| null|
| -8| -8|
+----+--------+
解决方案
scala> testDF.agg(max("idAsLong")).show
+-------------+
|max(idAsLong)|
+-------------+
| 5|
+-------------+
使用na运算符
如果只有负值和null
,那么null
是最大值,该怎么办?在Dataset
上使用na
运算符.
Using na Operator
What if you had only negative values and null
and so null
is the maximum value? Use na
operator on Dataset
.
val withNulls = Seq("-1", "-5", null, null, "-333", null)
.toDF("id")
.withColumn("asInt", $"id" cast "int") // <-- column of type int with nulls
scala> withNulls.na.fill(Map("asInt" -> 0)).agg(max("asInt")).show
+----------+
|max(asInt)|
+----------+
| 0|
+----------+
如果没有na
而没有替换null
,将无法正常工作.
Without na
and replacing null
it simply won't work.
scala> withNulls.agg(max("asInt")).show
+----------+
|max(asInt)|
+----------+
| -1|
+----------+
请参见 查看全文