检查spark数据框内是否有空行? [英] Check for empty row within spark dataframe?
本文介绍了检查spark数据框内是否有空行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在几个csv文件上运行时,我试图运行并进行一些检查,由于某种原因,一个文件我得到了 NullPointerException
,并且我怀疑其中有一些空行.
Running over several csv files and i am trying to run and do some checks and for some reason for one file i am getting a NullPointerException
and i am suspecting that there are some empty row.
所以我正在运行以下命令,由于某种原因,它给了我 OK
输出:
So i am running the following and for some reason it gives me an OK
output:
check_empty = lambda row : not any([False if k is None else True for k in row])
check_empty_udf = sf.udf(check_empty, BooleanType())
df.filter(check_empty_udf(sf.struct([col for col in df.columns]))).show()
过滤器函数中缺少某些内容,或者我们无法从数据框中提取空行.
I am missing something within the filter function or we can't extract empty rows from dataframes.
推荐答案
You could use df.dropna() to drop empty rows and then compare the counts.
类似
df_clean = df.dropna()
num_empty_rows = df.count() - df_clean.count()
这篇关于检查spark数据框内是否有空行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文