如果一行满足条件,则从pandas数据框中删除级别及其所有行 [英] Remove level and all of its rows from pandas dataframe if one row meets condition

查看:465
本文介绍了如果一行满足条件,则从pandas数据框中删除级别及其所有行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我要过滤的熊猫数据框.当年份中至少一行(即visit)的温度为<时,我想删除年份及其所有行. 37.我可以在2014年删除温度为36的特定行;但是,我不知道如何度过整个一年.

Below is a pandas dataframe that I would like to filter. I would like to remove the year and all of its rows when the temp for at least one row (i.e., visit) in that year is < 37. I am able to remove the specific row in 2014 where the temp is 36; however, I do not know how to make the entire year go away.

index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]], names=['yr', 
                                                                  'visit'])
columns = pd.MultiIndex.from_product(['hr', 'temp'], names=['metric'])
data = pd.DataFrame([[96, 38], [98, 38], [85, 36], [84, 43]], index=index, 
                    columns=columns)
data

        metric  hr      temp    
yr      visit                       
2013    1       96      38  
        2       98      38  
2014    1       85      36  
        2       84      43  

所需的输出:

        metric  hr      temp    
yr      visit                       
2013    1       96      38  
        2       98      38  

推荐答案

您可以使用

You could use groupby/filter to remove groups based on a condition:

import numpy as np
import pandas as pd

index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]], names=['yr', 'visit'])
columns = pd.MultiIndex.from_product([['hr', 'temp']], names=['metric'])
data = pd.DataFrame([[96, 38], [98, 38], [85, 36], [84, 43]], index=index, columns=columns)

print(data.groupby(level='yr').filter(lambda x: (x['temp']>=37).all()))

收益

metric      hr temp
yr   visit         
2013 1      96   38
     2      98   38

由于要删除的行按yr分组,并且yr是索引级别,因此请使用groupby(level='yr').对于每个组,在将x设置为sub-DataFrame组的情况下调用lambda函数.该组在以下情况下保留 (x['temp']>=37).all())True.

Since the rows you wish to remove are grouped by yr and the yr is a level of the index, use groupby(level='yr'). For each group the lambda function is called with x set to the sub-DataFrame group. The group is kept when (x['temp']>=37).all()) is True.

请注意,更快,特别是对于大型DataFrames,因为data['temp']>=37以矢量化方式来计算整个列的标准,而在我上面的解决方案中,(x['temp']>=37).all()是以零散的方式来计算标准以用于每个子DataFrame分别.通常,将矢量化解决方案应用于大型数组或NDFrame时,而不是在较小的块中循环使用时,速度会更快.

is faster, particularly for large DataFrames, since data['temp']>=37 computes the criterion in a vectorized way for the entire column whereas in my solution above, (x['temp']>=37).all() computes the criterion in a piecemeal fashion for each sub-DataFrame separately. Generally, vectorized solutions are faster when applied to large arrays or NDFrames instead of in a loop on smaller pieces.

以下是显示1000行DataFrame速度差异的示例:

Here is an example showing the difference in speed for a 1000-row DataFrame:

In [70]: df = pd.DataFrame(np.random.randint(100, size=(1000, 4)), columns=list('ABCD')).set_index(['A','B'])

In [71]: %timeit df.groupby(level='A').filter(lambda x: (x['C']>=5).all())
10 loops, best of 3: 46.3 ms per loop

In [72]: %timeit df.loc[(df['C']>=37).groupby(level='A').transform('all')]
100 loops, best of 3: 18.9 ms per loop

这篇关于如果一行满足条件,则从pandas数据框中删除级别及其所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆