计算Spark(Scala)中dataframe列中的空值 [英] Count empty values in dataframe column in Spark (Scala)

查看:1042
本文介绍了计算Spark(Scala)中dataframe列中的空值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试像这样在DataFrame的列中计算空值:

I'm trying to count empty values in column in DataFrame like this:

df.filter((df(colname) === null) || (df(colname) === "")).count()

列名中有列的名称.如果列类型为字符串,但列类型为整数,并且有一些null,则此代码始终返回0,这可以很好地工作.为什么会这样呢?如何对其进行更改以使其正常工作?

In colname there is a name of the column. This works fine if column type is string but if column type is integer and there are some nulls this code always returns 0. Why is this so? How to change it to make it work?

推荐答案

正如问题所述,df.filter((df(colname) === null) || (df(colname) === "")).count()适用于String数据类型,但测试表明未处理null.

As mentioned on the question that df.filter((df(colname) === null) || (df(colname) === "")).count() works for String data types but the testing shows that null are not handled.

@Psidom的答案同时处理nullempty,但处理NaN.

@Psidom's answer handles both null and empty but does not handle for NaN.

检查.isNaN应该可以处理所有三种情况

checking for .isNaN should handle all three cases

df.filter(df(colName).isNull || df(colName) === "" || df(colName).isNaN).count()

这篇关于计算Spark(Scala)中dataframe列中的空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆