如何在pyspark数据帧中返回具有空值的行? [英] How to return rows with Null values in pyspark dataframe?
问题描述
我正在尝试从pyspark数据框中获取具有空值的行.在熊猫中,我可以在数据帧上使用isnull()
来实现此目的:
I am trying to get the rows with null values from a pyspark dataframe. In pandas, I can achieve this using isnull()
on the dataframe:
df = df[df.isnull().any(axis=1)]
但是在PySpark的情况下,当我在以下命令下运行时,它会显示Attributeerror:
But in case of PySpark, when I am running below command it shows Attributeerror:
df.filter(df.isNull())
AttributeError:"DataFrame"对象没有属性"isNull".
AttributeError: 'DataFrame' object has no attribute 'isNull'.
如何在不检查每一列的情况下获取具有空值的行?
How can get the rows with null values without checking it for each column?
推荐答案
您可以使用where
,reduce
和列表理解来过滤行.例如,给定以下数据框:
You can filter the rows with where
, reduce
and a list comprehension. For example, given the following dataframe:
df = sc.parallelize([
(0.4, 0.3),
(None, 0.11),
(9.7, None),
(None, None)
]).toDF(["A", "B"])
df.show()
+----+----+
| A| B|
+----+----+
| 0.4| 0.3|
|null|0.11|
| 9.7|null|
|null|null|
+----+----+
使用某些null
值过滤行可以通过以下方式实现:
Filtering the rows with some null
value could be achieved with:
import pyspark.sql.functions as f
from functools import reduce
df.where(reduce(lambda x, y: x | y, (f.col(x).isNull() for x in df.columns))).show()
哪个给:
+----+----+
| A| B|
+----+----+
|null|0.11|
| 9.7|null|
|null|null|
+----+----+
在条件语句中,您必须指定是否有(或|),所有(和&)等.
In the condition statement you have to specify if any (or, |), all (and, &), etc.
这篇关于如何在pyspark数据帧中返回具有空值的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!