在不删除行的情况下过滤Pandas DataFrame [英] Filtering a Pandas DataFrame Without Removing Rows

查看:238
本文介绍了在不删除行的情况下过滤Pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在Pandas DataFrame上使用where,用NaN替换所有不符合我的条件的单元格.但是,我希望以始终保留原始DataFrame形状的方式进行操作,而不要从生成的DataFrame中删除任何行.

给出以下数据框:

      A    B    C    D
1/1   0    1    0    1
1/2   2    1    1    1
1/3   3    0    1    0 
1/4   1    0    1    2
1/5   1    0    1    1
1/6   2    0    2    1
1/7   3    5    2    3

当列D ALSO满足特定条件时,我想在数据框中搜索满足特定条件的所有单元格.在这种情况下,我的标准是:

当D列也> 1时,查找所有大于先前值的单元格

我通过使用以下语法来完成此操作:

matches = df[df > df.shift(1))]
matches = matches[df.D > 1]

由于df.D是一个Series并且与整个DataFrame的形状不匹配,因此我不得不将此查询分为两个语句.根据 这个问题 我之前问过,广播&运营商的支持要到0.14才可用.

我遇到的问题是,运行第二条语句后,似乎更改了结果数据框的形状,并删除了行.列数保持不变.第一条语句保留原始行数.

为什么第二条语句删除行,而第一条语句不删除行?如何获得相同的结果,但保留完整的行数?

pandas文档指出,为了确保保留形状,我应该在布尔索引上使用where方法.但是,似乎不允许我执行第二条语句,所以:

matches.where(df.D > 1)

给我以下错误:

ValueError:条件数组的形状必须与self相同

解决方案

这比@DSM答案更直观(但是熊猫在boolean ops ATM上缺少这种类型的自动广播)

In [58]: df.where((df>df.shift(1)).values & DataFrame(df.D==1).values)
Out[58]: 
      A   B   C   D
1/1 NaN NaN NaN NaN
1/2   2 NaN   1 NaN
1/3 NaN NaN NaN NaN
1/4 NaN NaN NaN NaN
1/5 NaN NaN NaN NaN
1/6   2 NaN   2 NaN
1/7 NaN NaN NaN NaN

有关<0.14中要解决的问题,请参见此处.

I'm trying to use where on my Pandas DataFrame in replace all cells that don't meet my criteria with NaN. Howevever, I'd like to do it in such a way that will always preserve the shape of my original DataFrame, and not remove any rows from the resulting DataFrame.

Given the following DataFrame:

      A    B    C    D
1/1   0    1    0    1
1/2   2    1    1    1
1/3   3    0    1    0 
1/4   1    0    1    2
1/5   1    0    1    1
1/6   2    0    2    1
1/7   3    5    2    3

I would like to search the dataframe for all cells that meet a certain criteria, when column D ALSO meets a particular criteria. In this case my criteria is:

Find all cells that are greater than the previous value, when column D is also > 1

I accomplish this by using the following syntax:

matches = df[df > df.shift(1))]
matches = matches[df.D > 1]

I have to split this query into two statements because of the fact that df.D is a Series and does not match the shape of the entire DataFrame. According to this question I asked previously, support for a broadcasting & operator will not be available until 0.14.

The problem I am having is that it seems like after I run the second statement, the shape of the resulting data frame is changed and rows have been removed. The number of columns stays the same. The first statement leaves the original number of rows.

Why would the second statement remove rows while the first does not? How could I achieve the same result, but leaving the full number of rows in tact?

Edit:

The pandas documentation states that in order to guarantee that the shape is preserved, I should use the where method over boolean indexing. However, that does not seem to be allowed to perform my second statement, so:

matches.where(df.D > 1)

Gives me the following error:

ValueError: Array conditional must be same shape as self

解决方案

This is slightly more intuitive than @DSM answer (but pandas missing this type of auto-broadcasting on boolean ops ATM)

In [58]: df.where((df>df.shift(1)).values & DataFrame(df.D==1).values)
Out[58]: 
      A   B   C   D
1/1 NaN NaN NaN NaN
1/2   2 NaN   1 NaN
1/3 NaN NaN NaN NaN
1/4 NaN NaN NaN NaN
1/5 NaN NaN NaN NaN
1/6   2 NaN   2 NaN
1/7 NaN NaN NaN NaN

see here for the issue to be addressed in 0.14

这篇关于在不删除行的情况下过滤Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆