pandas :根据其他行删除行 [英] Pandas : Delete rows based on other rows

查看:96
本文介绍了 pandas :根据其他行删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,看起来像这样:

I have a pandas dataframe which looks like that :

qseqid  sseqid  qstart    qend
2         1     125       345
4         1     150       320
3         2     150       450
6         2     25        300
8         2     50        500

我想根据这些条件基于其他行值删除行:如果存在另一行(r2)具有相同的sseqidr1[qstart] > r2[qstart]r1[qend] < r2[qend],则必须删除行(r1).

I would like to remove rows based on other rows values with these criterias : A row (r1) must be removed if another row (r2) exist with the same sseqid and r1[qstart] > r2[qstart] and r1[qend] < r2[qend].

大熊猫有可能吗?

推荐答案

df  = pd.DataFrame({'qend': [345, 320, 450, 300, 500],
 'qseqid': [2, 4, 3, 6, 8],
 'qstart': [125, 150, 150, 25, 50],
 'sseqid': [1, 1, 2, 2, 2]})

def remove_rows(df):
    merged = pd.merge(df.reset_index(), df, on='sseqid')
    mask = ((merged['qstart_x'] > merged['qstart_y']) 
            & (merged['qend_x'] < merged['qend_y']))
    df_mask = ~df.index.isin(merged.loc[mask, 'index'].values)
    result = df.loc[df_mask]
    return result

result = remove_rows(df)
print(result)

收益

   qend  qseqid  qstart  sseqid
0   345       2     125       1
3   300       6      25       2
4   500       8      50       2


这个想法是使用pd.merge与每对成对的行构成一个DataFrame 具有相同的sseqid:


The idea is to use pd.merge to form a DataFrame with every pairing of rows with the same sseqid:

In [78]: pd.merge(df.reset_index(), df, on='sseqid')
Out[78]: 
    index  qend_x  qseqid_x  qstart_x  sseqid  qend_y  qseqid_y  qstart_y
0       0     345         2       125       1     345         2       125
1       0     345         2       125       1     320         4       150
2       1     320         4       150       1     345         2       125
3       1     320         4       150       1     320         4       150
4       2     450         3       150       2     450         3       150
5       2     450         3       150       2     300         6        25
6       2     450         3       150       2     500         8        50
7       3     300         6        25       2     450         3       150
8       3     300         6        25       2     300         6        25
9       3     300         6        25       2     500         8        50
10      4     500         8        50       2     450         3       150
11      4     500         8        50       2     300         6        25
12      4     500         8        50       2     500         8        50

合并的每一行都包含来自两行df的数据.然后,您可以使用

Each row of merged contains data from two rows of df. You can then compare every two rows using

mask = ((merged['qstart_x'] > merged['qstart_y']) 
        & (merged['qend_x'] < merged['qend_y']))

并在df.index中找到与该条件不匹配的标签:

and find the labels in df.index that do not match this condition:

df_mask = ~df.index.isin(merged.loc[mask, 'index'].values)

然后选择这些行:

result = df.loc[df_mask]

请注意,这假定df具有唯一索引.

Note that this assumes df has a unique index.

这篇关于 pandas :根据其他行删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆