pandas - 根据NaN值的组合删除行 [英] Pandas - Remove rows based on combinations of NaN values

查看:180
本文介绍了 pandas - 根据NaN值的组合删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,看起来像这样:

I have a data frame that looks something like this:

NUM   A      B        C      D        E        F
p1    NaN    -1.183   NaN    NaN      NaN      1.829711
p5    NaN    NaN      NaN    NaN      1.267   -1.552721
p9    1.138  NaN      NaN    -1.179   NaN      1.227306

在列F和至少一个其他列AE中始终存在非NaN值。

There is always a non-NaN value in: column F and at least one other column A-E.

我想创建一个仅包含的子表这些列包含列中非NaN值的某些组合。有许多这些期望的组合,包括双峰和三重态。以下是我想提取的三种组合的示例:

I want to create a sub-table containing only those rows which contain certain combinations of non-NaN values in columns. There are a number of these desired combinations including doublets and triplets. Here are examples of three such combinations I want to pull:


  1. 列中包含非NaN值的行A& B

  2. 在C& C中包含非NaN值的行D

  3. 在A& A中包含非NaN值的行B& C

我已经知道这个问题,但是我不知道如何应用它们列组合。

I already know about the np.isfinite and pd.notnull commands from this question but I do not know how to apply them to combinations of columns.

此外,一旦我有一个删除不符合我所需组合的行的命令列表,我不知道如何告诉熊猫删除行只有它们不符合任何所需的组合。

Also, once I have a list of commands for removing rows that do not match one of my desired combinations, I do not know how to tell Pandas to remove rows ONLY if they do not match any of the desired combinations.

推荐答案

很多时候,我们需要对布尔数组(numpy arrays or pandas series)进行逻辑运算,作为选择的一部分数据帧的子集。使用'和','或','not'运算符将不起作用。

Many times, we will need to do logical operations on Boolean arrays (either numpy arrays or pandas series) as part of selecting a subset of a dataframe. Using 'and', 'or', 'not' operators for this will not work.

In [79]: df[pd.notnull(df['A']) and pd.notnull(df['F'])]

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

在Python中,当使用'和','或'和'not'运算符时,非boolean变量通常被认为是True,除非它们代表空对象,如 [] int(0) float(0) None 等等。所以,使用这些相同的运算符来做排列布尔是令人困惑的熊猫行动有些人会希望他们简单地评估为 True

In Python, when using 'and', 'or' and 'not' operators, non-boolean variables are usually considered to be True unless they represent "empty" objects like [], int(0), float(0), None etc. So, it would be confusing to use these same operators for doing array-wise Boolean operations in Pandas. Some people would expect them to simply evaluate to True

相反,我们应该使用& ; | 为此。

Instead, we should use &, | and ~for this.

In [69]: df[pd.notnull(df['A']) & pd.notnull(df['F'])]
Out[69]:
  NUM      A   B   C      D   E         F
2  p9  1.138 NaN NaN -1.179 NaN  1.227306

另一种较短但不太灵活的方法是使用 any() all() empty

An alternative shorter, but less flexible way to do this is to use any(), all() or empty.

In [78]: df[pd.notnull(df[['A', 'F']]).all(axis=1)]
Out[78]:
  NUM      A   B   C      D   E         F
2  p9  1.138 NaN NaN -1.179 NaN  1.227306

您可以阅读更多关于此这里

You can read more on this here

这篇关于 pandas - 根据NaN值的组合删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆