pandas 重新映射到列中的范围 [英] Pandas remap to range in column
本文介绍了 pandas 重新映射到列中的范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个ID为s的列的DataFrame,可以包含重复项:
I have a DataFrame with a colum with id:s, can contain duplicates:
>>> df['user_id'].head()
Out[3]:
0 2134
1 1234
2 4323
3 25434
4 1234
Name: user_id, dtype: int64
我该如何重新映射它,以使用户ID从任意数字开始递增,并根据原始数字递增?在此示例中,将从2开始:
How can I remap this so that the user id's goes from an arbitrary number and up, incrementally according to the original number? In this example it will be the following, starting from 2:
>>> df['user_id'].head()
Out[3]:
0 3
1 2
2 4
3 5
4 2
Name: user_id, dtype: int64
推荐答案
IIUC,您要按该列中的值对df进行排序,然后再使用factorize
:
IIUC, you want to sort the df by the values in that column, first and then use factorize
:
In [29]:
df1 = df.reindex(df['user_id'].sort_values().index)
df1
Out[29]:
user_id
index
1 1234
4 1234
0 2134
2 4323
3 25434
In [30]:
df1['new_id'] = pd.factorize(df1['user_id'])[0] + 2
df1
Out[30]:
user_id new_id
index
1 1234 2
4 1234 2
0 2134 3
2 4323 4
3 25434 5
然后您可以使用sort_index
还原索引:
You can then restore the index using sort_index
:
In [31]:
df1 = df1.sort_index()
df1
Out[31]:
user_id new_id
index
0 2134 3
1 1234 2
2 4323 4
3 25434 5
4 1234 2
然后您可以覆盖或删除列,以上只是演示如何获取所需的值
You can then either overwrite or drop a column, the above is just to demonstrate how to get the values you want
这篇关于 pandas 重新映射到列中的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文