pandas :根据行值删除重复项 [英] Pandas: Drop duplicates based on row value

查看:94
本文介绍了 pandas :根据行值删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想根据不同的条件删除重复项...。​​

I have a dataframe and I want to drop duplicates based on different conditions....

        A      B
  0     1     1.0
  1     1     1.0
  2     2     2.0
  3     2     2.0
  4     3     3.0
  5     4     4.0
  6     5     5.0
  7     -     5.1
  8     -     5.1
  9     -     5.3

我想全部删除A列中的重复项,但带有-的行除外。此后,我想基于A列的B列值,使用-作为值从A列中删除重复项。给定输入数据框,它应返回以下内容:-

I want to drop all the duplicates from column A except rows with "-". After this, I want to drop duplicates from column A with "-" as a value based on their column B value. Given the input dataframe, this should return the following:-

        A      B
  0     1     1.0
  2     2     2.0
  4     3     3.0
  5     4     4.0
  6     5     5.0
  7     -     5.1
  9     -     5.3

我有以下代码,但对于大量数据而言效率不高,我该如何改善它。...

I have the following code but it's not very efficient for very large amounts of data, how can I improve this....

 def generate(df):
     str_col = df[df["A"] == "-"]

     df.drop(df[df["A"] == "-"].index, inplace=True)

     df = df.drop_duplicates(subset="A")

     str_col = b.drop_duplicates(subset="B")

     bigdata = df.append(str_col, ignore_index=True)

     return bigdata.sort_values("B")


推荐答案

重复 eq

df[~df.duplicated('A')            # keep those not duplicates in A
   | (df['A'].eq('-')             # or those '-' in A
      & ~df['B'].duplicated())]   # which are not duplicates in B

输出:

   A    B
0  1  1.0
2  2  2.0
4  3  3.0
5  4  4.0
6  5  5.0
7  -  5.1
9  -  5.3

这篇关于 pandas :根据行值删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆