为列pandas数据框分配唯一的ID [英] Assign unique id to columns pandas data frame

查看:199
本文介绍了为列pandas数据框分配唯一的ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,我有以下数据框

df = 
A      B   
John   Tom
Homer  Bart
Tom    Maggie
Lisa   John 

我想为每个名称分配一个唯一的ID并返回

I would like to assign to each name a unique ID and returns

df = 
A      B         C    D

John   Tom       0    1
Homer  Bart      2    3
Tom    Maggie    1    4 
Lisa   John      5    0

我要做的是以下事情:

LL1 = pd.concat([df.a,df.b],ignore_index=True)
LL1 = pd.DataFrame(LL1)
LL1.columns=['a']
nameun = pd.unique(LL1.a.ravel())
LLout['c'] = 0
LLout['d'] = 0
NN = list(nameun)
for i in range(1,len(LLout)):
   LLout.c[i] = NN.index(LLout.a[i])
   LLout.d[i] = NN.index(LLout.b[i])

但是由于我有一个非常大的数据集,所以这个过程非常缓慢.

But since I have a very large dataset this process is very slow.

推荐答案

这是一种方法.首先获取唯一名称的数组:

Here's one way. First get the array of unique names:

In [11]: df.values.ravel()
Out[11]: array(['John', 'Tom', 'Homer', 'Bart', 'Tom', 'Maggie', 'Lisa', 'John'], dtype=object)

In [12]: pd.unique(df.values.ravel())
Out[12]: array(['John', 'Tom', 'Homer', 'Bart', 'Maggie', 'Lisa'], dtype=object)

并将其设为系列,将名称映射到其各自的编号:

and make this a Series, mapping names to their respective numbers:

In [13]: names = pd.unique(df.values.ravel())

In [14]: names = pd.Series(np.arange(len(names)), names)

In [15]: names
Out[15]:
John      0
Tom       1
Homer     2
Bart      3
Maggie    4
Lisa      5
dtype: int64

现在使用applymapnames.get查找这些数字:

Now use applymap and names.get to lookup these numbers:

In [16]: df.applymap(names.get)
Out[16]:
   A  B
0  0  1
1  2  3
2  1  4
3  5  0

并将其分配给正确的列:

and assign it to the correct columns:

In [17]: df[["C", "D"]] = df.applymap(names.get)

In [18]: df
Out[18]:
       A       B  C  D
0   John     Tom  0  1
1  Homer    Bart  2  3
2    Tom  Maggie  1  4
3   Lisa    John  5  0

注意:这假设所有值都是以开头的名称,您可能希望仅将其限制为某些列:

Note: This assumes that all the values are names to begin with, you may want to restrict this to some columns only:

df[['A', 'B']].values.ravel()
...
df[['A', 'B']].applymap(names.get)

这篇关于为列pandas数据框分配唯一的ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆