Pyspark数据框如何删除所有列中为空的行? [英] Pyspark dataframe how to drop rows with nulls in all columns?
本文介绍了Pyspark数据框如何删除所有列中为空的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
对于一个数据框,在它之前是这样的:
For a dataframe, before it is like:
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null|null|null|
|null| B| X1|
+----+----+----+
我希望它是这样的:
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null| B| X1|
+----+----+----+
我更喜欢一种通用方法,这样它可以在 df.columns
很长时应用.谢谢!
I prefer a general method such that it can apply when df.columns
is very long.
Thanks!
推荐答案
一种选择是使用functools.reduce
来构造条件:
One option is to use functools.reduce
to construct the conditions:
from functools import reduce
df.filter(~reduce(lambda x, y: x & y, [df[c].isNull() for c in df.columns])).show()
+----+----+----+
| ID|TYPE|CODE|
+----+----+----+
| 1| B| X1|
|null| B| X1|
+----+----+----+
where reduce
产生如下查询:
where reduce
produce a query as follows:
~reduce(lambda x, y: x & y, [df[c].isNull() for c in df.columns])
# Column<b'(NOT (((ID IS NULL) AND (TYPE IS NULL)) AND (CODE IS NULL)))'>
这篇关于Pyspark数据框如何删除所有列中为空的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文