为 pandas 中的组分配唯一的数字组ID [英] Assign Unique Numeric Group IDs to Groups in Pandas
问题描述
我一直遇到必须为数据集中的每个组分配唯一ID的问题.在RNN的零填充,生成图形和许多其他场合时,我已经使用了此方法.
I've consistently run into this issue of having to assign a unique ID to each group in a data set. I've used this when zero padding for RNN's, generating graphs, and many other occasions.
通常可以通过串联每个pd.groupby
列中的值来完成.但是,通常情况是,定义组的列数,它们的dtype或值的大小使连接成为一种不必要的解决方案,不必要地占用了内存.
This can usually be done by concatenating the values in each pd.groupby
column. However, it is often the case the number of columns that define a group, their dtype, or the value sizes make concatenation an impractical solution that needlessly uses up memory.
我想知道是否有一种简单的方法可以为熊猫中的组分配唯一的数字ID.
I was wondering if there was an easy way to assign a unique numeric ID to groups in pandas.
推荐答案
您只需要seeiespi(或pd.factorize
)中的ngroup
数据
You just need ngroup
data from seeiespi (or pd.factorize
)
df.groupby('C').ngroup()
Out[322]:
0 0
1 0
2 2
3 1
4 1
5 1
6 1
7 2
8 2
dtype: int64
更多选项
pd.factorize(df.C)[0]
Out[323]: array([0, 0, 1, 2, 2, 2, 2, 1, 1], dtype=int64)
df.C.astype('category').cat.codes
Out[324]:
0 0
1 0
2 2
3 1
4 1
5 1
6 1
7 2
8 2
dtype: int8
这篇关于为 pandas 中的组分配唯一的数字组ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!