pandas -从df中识别独特的三胞胎 [英] Pandas - identify unique triplets from a df

查看:79
本文介绍了 pandas -从df中识别独特的三胞胎的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个代表唯一项目的数据框.每个项目都由一组varAvarBvarC唯一标识(因此,每个项目的varAvarBvarC都有0到n个值).我的df每个唯一商品都有多个Raw,并带有varAvarBvarC的各种组合.

I have a dataframe which represents unique items. Each item is uniquely identified by a set of varA, varB, and varC (so each item has 0 to n values for varA, varB, or varC). My df has multiple raws per unique item, with various combination of varA, varB, and varC.

df就是这样(ID在该列中是唯一的,但并不代表唯一的项).

The df is like this (ID is unique in the column, but it doesn't represent the unique item).

df = pd.DataFrame({'ID':[1,2,3,4,5],
                   'varA':['a', 'd', 'a', 'm','Z'],
                   'varB':['b', 'e', 'k', 'e',NaN],
                   'varC':['c', 'f', 'l', NaN ,'t']})

所以在这里的df中,您可以看到:

So in the df here, you can see that:

  • 1和3是相同的项目,具有:{varA:[a],varB:[b,k],varC:[c,l]}.
  • 2和4也相同:{varA:[d,m],varB:[e],varC:[f]}

我想识别每个唯一的商品,给他们一个唯一的ID,并存储他们的信息.

I would like to identify every unique item, give them a unique id, and store their information.

我编写的代码效率极低:

The code I have written is terribly inefficient:

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆