计算Spark(Scala)中dataframe列中的空值 [英] Count empty values in dataframe column in Spark (Scala)
问题描述
我正在尝试像这样在DataFrame的列中计算空值:
I'm trying to count empty values in column in DataFrame like this:
df.filter((df(colname) === null) || (df(colname) === "")).count()
列名中有列的名称.如果列类型为字符串,但列类型为整数,并且有一些null,则此代码始终返回0,这可以很好地工作.为什么会这样呢?如何对其进行更改以使其正常工作?
In colname there is a name of the column. This works fine if column type is string but if column type is integer and there are some nulls this code always returns 0. Why is this so? How to change it to make it work?
推荐答案
正如问题所述,df.filter((df(colname) === null) || (df(colname) === "")).count()
适用于String
数据类型,但测试表明未处理null
.
As mentioned on the question that df.filter((df(colname) === null) || (df(colname) === "")).count()
works for String
data types but the testing shows that null
are not handled.
@Psidom的答案同时处理null
和empty
,但不处理NaN
.
@Psidom's answer handles both null
and empty
but does not handle for NaN
.
检查.isNaN
应该可以处理所有三种情况
checking for .isNaN
should handle all three cases
df.filter(df(colName).isNull || df(colName) === "" || df(colName).isNaN).count()
这篇关于计算Spark(Scala)中dataframe列中的空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!