pandas 重新映射到列中的范围 [英] Pandas remap to range in column

查看：56 发布时间：2020/5/18 22:59:35 python pandas numpy

本文介绍了 pandas 重新映射到列中的范围的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个ID为s的列的DataFrame，可以包含重复项:

I have a DataFrame with a colum with id:s, can contain duplicates:

>>> df['user_id'].head()
Out[3]: 
0    2134
1    1234
2    4323
3    25434
4    1234
Name: user_id, dtype: int64

我该如何重新映射它，以使用户ID从任意数字开始递增，并根据原始数字递增?在此示例中，将从2开始:

How can I remap this so that the user id's goes from an arbitrary number and up, incrementally according to the original number? In this example it will be the following, starting from 2:

>>> df['user_id'].head()
Out[3]: 
0    3
1    2
2    4
3    5
4    2
Name: user_id, dtype: int64

推荐答案

IIUC，您要按该列中的值对df进行排序，然后再使用factorize:

IIUC, you want to sort the df by the values in that column, first and then use factorize:

In [29]:
df1 = df.reindex(df['user_id'].sort_values().index)
df1

Out[29]:
       user_id
index         
1         1234
4         1234
0         2134
2         4323
3        25434

In [30]:    
df1['new_id'] = pd.factorize(df1['user_id'])[0] + 2
df1

Out[30]:
       user_id  new_id
index                 
1         1234       2
4         1234       2
0         2134       3
2         4323       4
3        25434       5

然后您可以使用sort_index还原索引:

You can then restore the index using sort_index:

In [31]:
df1 = df1.sort_index()
df1

Out[31]:
       user_id  new_id
index                 
0         2134       3
1         1234       2
2         4323       4
3        25434       5
4         1234       2

然后您可以覆盖或删除列，以上只是演示如何获取所需的值

You can then either overwrite or drop a column, the above is just to demonstrate how to get the values you want

这篇关于 pandas 重新映射到列中的范围的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 重新映射到列中的范围 [英] Pandas remap to range in column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 重新映射到列中的范围 [英] Pandas remap to range in column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭