Pyspark DataFrame如何在所有列中删除带有空值的行? [英] Pyspark dataframe how to drop rows with nulls in all columns?
本文介绍了Pyspark DataFrame如何在所有列中删除带有空值的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
对于一个数据框,就像这样:
For a dataframe, before it is like:
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null|null|null|
|null| B| X1|
+----+----+----+
我希望是这样
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null| B| X1|
+----+----+----+
我更喜欢一种通用方法,使得它可以在df.columns
很长时应用.
谢谢!
I prefer a general method such that it can apply when df.columns
is very long.
Thanks!
推荐答案
一种选择是使用functools.reduce
构造条件:
One option is to use functools.reduce
to construct the conditions:
from functools import reduce
df.filter(~reduce(lambda x, y: x & y, [df[c].isNull() for c in df.columns])).show()
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null| B| X1|
+----+----+----+
其中reduce
产生如下查询:
~reduce(lambda x, y: x & y, [df[c].isNull() for c in df.columns])
# Column<b'(NOT (((ID IS NULL) AND (TYPE IS NULL)) AND (CODE IS NULL)))'>
这篇关于Pyspark DataFrame如何在所有列中删除带有空值的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文