为什么测试`NaN == NaN`不能从 pandas dataFrame中删除? [英] Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?
问题描述
请解释一下如何在熊猫中处理NaN,因为以下逻辑对我来说似乎被破坏了",我尝试了各种方法(如下所示)来删除空值.
Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.
我使用read.csv
从CSV文件加载的数据帧具有列comments
,该列通常为空.
My dataframe, which I load from a CSV file using read.csv
, has a column comments
, which is empty most of the time.
列marked_results.comments
看起来像这样;列的其余所有内容均为NaN,因此pandas会将空条目作为NaN加载,到目前为止效果很好:
The column marked_results.comments
looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
....
现在,我尝试删除这些条目,只有这样:
Now I try to drop those entries, only this works:
-
marked_results.comments.isnull()
所有这些均无效:
-
marked_results.comments.dropna()
仅给出同一列,没有任何内容掉落,令人困惑. -
marked_results.comments == NaN
仅给出一系列所有False
. NaN没什么...令人困惑. - 同样
marked_results.comments == nan
marked_results.comments.dropna()
only gives the same column, nothing gets dropped, confusing.marked_results.comments == NaN
only gives a series of allFalse
s. Nothing was NaNs... confusing.- likewise
marked_results.comments == nan
我也尝试过:
comments_values = marked_results.comments.unique()
array(['VP', 'TEST', nan], dtype=object)
# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!
推荐答案
您应该使用isnull
和notnull
来测试NaN(使用熊猫dtypes比numpy更健壮),请参见
You should use isnull
and notnull
to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs.
使用Series方法 dropna
不会影响原始数据框,但会执行您想要的操作:
Using the Series method dropna
on a column won't affect the original dataframe, but do what you want:
In [11]: df
Out[11]:
comments
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
In [12]: df.comments.dropna()
Out[12]:
0 VP
1 VP
2 VP
3 TEST
Name: comments, dtype: object
dropna
DataFrame 方法具有一个子集参数(用于删除在特定列中具有NaN的行):
The dropna
DataFrame method has a subset argument (to drop rows which have NaNs in specific columns):
In [13]: df.dropna(subset=['comments'])
Out[13]:
comments
0 VP
1 VP
2 VP
3 TEST
In [14]: df = df.dropna(subset=['comments'])
这篇关于为什么测试`NaN == NaN`不能从 pandas dataFrame中删除?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!