为什么测试`NaN == NaN`不能从 pandas dataFrame中删除? [英] Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?

查看:108
本文介绍了为什么测试`NaN == NaN`不能从 pandas dataFrame中删除?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请解释一下如何在熊猫中处理NaN,因为以下逻辑对我来说似乎被破坏了",我尝试了各种方法(如下所示)来删除空值.

Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.

我使用read.csv从CSV文件加载的数据帧具有列comments,该列通常为空.

My dataframe, which I load from a CSV file using read.csv, has a column comments, which is empty most of the time.

marked_results.comments看起来像这样;列的其余所有内容均为NaN,因此pandas会将空条目作为NaN加载,到目前为止效果很好:

The column marked_results.comments looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:

0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN
....

现在,我尝试删除这些条目,只有这样:

Now I try to drop those entries, only this works:

  • marked_results.comments.isnull()

所有这些均无效:

  • marked_results.comments.dropna()仅给出同一列,没有任何内容掉落,令人困惑.
  • marked_results.comments == NaN仅给出一系列所有False. NaN没什么...令人困惑.
  • 同样marked_results.comments == nan
  • marked_results.comments.dropna() only gives the same column, nothing gets dropped, confusing.
  • marked_results.comments == NaN only gives a series of all Falses. Nothing was NaNs... confusing.
  • likewise marked_results.comments == nan

我也尝试过:

comments_values = marked_results.comments.unique()

array(['VP', 'TEST', nan], dtype=object)

# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!

推荐答案

您应该使用isnullnotnull来测试NaN(使用熊猫dtypes比numpy更健壮),请参见

You should use isnull and notnull to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs.

使用Series方法 dropna 不会影响原始数据框,但会执行您想要的操作:

Using the Series method dropna on a column won't affect the original dataframe, but do what you want:

In [11]: df
Out[11]:
  comments
0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN

In [12]: df.comments.dropna()
Out[12]:
0      VP
1      VP
2      VP
3    TEST
Name: comments, dtype: object

dropna DataFrame 方法具有一个子集参数(用于删除在特定列中具有NaN的行):

The dropna DataFrame method has a subset argument (to drop rows which have NaNs in specific columns):

In [13]: df.dropna(subset=['comments'])
Out[13]:
  comments
0       VP
1       VP
2       VP
3     TEST

In [14]: df = df.dropna(subset=['comments'])

这篇关于为什么测试`NaN == NaN`不能从 pandas dataFrame中删除?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆