从数据框中的行和列(单元格)中删除重复项，python [英] Remove duplicates from rows and columns (cell) in a dataframe, python

查看：51 发布时间：2021/4/28 20:54:05 python pandas dataframe

本文介绍了从数据框中的行和列(单元格)中删除重复项，python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两列，在数据框中每个单元格有很多重复项.与此类似:

I have two columns with a lot of duplicated items per cell in a dataframe. Something similar to this:

Index   x    y  
  1     1    ec, us, us, gbr, lst
  2     5    ec, us, us, us, us, ec, ec, ec, ec
  3     8    ec, us, us, gbr, lst, lst, lst, lst, gbr
  4     5    ec, ec, ec, us, us, ir, us, ec, ir, ec, ec
  5     7    chn, chn, chn, ec, ec, us, us, gbr, lst

我需要消除所有重复项，以得到如下所示的数据框:

I need to eliminate all the duplicate items an get a resulting dataframe like this:

Index   x    y  
  1     1    ec, us, gbr, lst
  2     5    ec, us
  3     8    ec, us, gbr,lst
  4     5    ec, us, ir
  5     7    chn, ec, us, gbr, lst

谢谢！

推荐答案

拆分并应用 set 和 join 即

df['y'].str.split(', ').apply(set).str.join(', ')

0         us, ec, gbr, lst
1                   us, ec
2         us, ec, gbr, lst
3               us, ec, ir
4    us, lst, ec, gbr, chn
Name: y, dtype: object

根据评论进行更新:

df['y'].str.replace('nan|[{}\s]','').str.split(',').apply(set).str.join(',').str.strip(',').str.replace(",{2,}",",")

# Replace all the braces and nan with `''`, then split and apply set and join

这篇关于从数据框中的行和列(单元格)中删除重复项，python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从数据框中的行和列(单元格)中删除重复项，python [英] Remove duplicates from rows and columns (cell) in a dataframe, python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从数据框中的行和列(单元格)中删除重复项，python [英] Remove duplicates from rows and columns (cell) in a dataframe, python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭