检查spark数据框内是否有空行? [英] Check for empty row within spark dataframe?

查看:87
本文介绍了检查spark数据框内是否有空行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在几个csv文件上运行时,我试图运行并进行一些检查,由于某种原因,一个文件我得到了 NullPointerException ,并且我怀疑其中有一些空行.

Running over several csv files and i am trying to run and do some checks and for some reason for one file i am getting a NullPointerException and i am suspecting that there are some empty row.

所以我正在运行以下命令,由于某种原因,它给了我 OK 输出:

So i am running the following and for some reason it gives me an OK output:

check_empty = lambda row : not any([False if k is None else True for k in row])
check_empty_udf = sf.udf(check_empty, BooleanType())
df.filter(check_empty_udf(sf.struct([col for col in df.columns]))).show()

过滤器函数中缺少某些内容,或者我们无法从数据框中提取空行.

I am missing something within the filter function or we can't extract empty rows from dataframes.

推荐答案

您可以使用

You could use df.dropna() to drop empty rows and then compare the counts.

类似

df_clean = df.dropna()
num_empty_rows = df.count() - df_clean.count()

这篇关于检查spark数据框内是否有空行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆