pandas ：根据行值删除重复项 [英] Pandas: Drop duplicates based on row value

查看：94 发布时间：2020/10/16 23:31:43 python pandas dataframe

本文介绍了 pandas ：根据行值删除重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，我想根据不同的条件删除重复项...。

I have a dataframe and I want to drop duplicates based on different conditions....

        A      B
  0     1     1.0
  1     1     1.0
  2     2     2.0
  3     2     2.0
  4     3     3.0
  5     4     4.0
  6     5     5.0
  7     -     5.1
  8     -     5.1
  9     -     5.3

我想全部删除A列中的重复项，但带有-的行除外。此后，我想基于A列的B列值，使用-作为值从A列中删除重复项。给定输入数据框，它应返回以下内容：-

I want to drop all the duplicates from column A except rows with "-". After this, I want to drop duplicates from column A with "-" as a value based on their column B value. Given the input dataframe, this should return the following:-

        A      B
  0     1     1.0
  2     2     2.0
  4     3     3.0
  5     4     4.0
  6     5     5.0
  7     -     5.1
  9     -     5.3

我有以下代码，但对于大量数据而言效率不高，我该如何改善它。...

I have the following code but it's not very efficient for very large amounts of data, how can I improve this....

 def generate(df):
     str_col = df[df["A"] == "-"]

     df.drop(df[df["A"] == "-"].index, inplace=True)

     df = df.drop_duplicates(subset="A")

     str_col = b.drop_duplicates(subset="B")

     bigdata = df.append(str_col, ignore_index=True)

     return bigdata.sort_values("B")

推荐答案

重复和 eq ：

df[~df.duplicated('A')            # keep those not duplicates in A
   | (df['A'].eq('-')             # or those '-' in A
      & ~df['B'].duplicated())]   # which are not duplicates in B

输出：

这篇关于 pandas ：根据行值删除重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas ：根据行值删除重复项 [英] Pandas: Drop duplicates based on row value

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas ：根据行值删除重复项 [英] Pandas: Drop duplicates based on row value

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭