如何在 Pandas 中用多个唯一字符串替换重复值? [英] How do you replace duplicate values with multiple unique strings in Pandas?
问题描述
import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
假设我有一个看起来像这样的数据框.我想弄清楚如何检查 Name 列的值Tom",如果我第一次找到它,我用值FirstTom"替换它,第二次出现时我用值SecondTom"替换它.你如何做到这一点?我之前使用过 replace 方法,但仅用于用单个值替换所有 Toms.我不想在值的末尾添加 1,而是将字符串完全更改为其他内容.
Lets say I have a dataframe that looks like this. I am trying to figure out how to check the Name column for the value 'Tom' and if I find it the first time I replace it with the value 'FirstTom' and the second time it appears I replace it with the value 'SecondTom'. How do you accomplish this? I've used the replace method before but only for replacing all Toms with a single value. I don't want to add a 1 on the end of the value, but completely change the string to something else.
如果df看起来更像下面这样,我们将如何检查第一列和第二列中的Tom,然后用FirstTom替换第一个实例,用SecondTom替换第二个实例
If the df looked more like this below, how would we check for Tom in the first column and the second column and then replace the first instance with FirstTom and the second instance with SecondTom
data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 'OtherName':[Tom, John, Bob, Steve]}
推荐答案
只需添加到现有解决方案中,即可使用 inflect
创建动态字典
Just adding in to the existing solutions , you can use inflect
to create dynamic dictionary
import inflect
p = inflect.engine()
df['Name'] += df.groupby('Name').cumcount().add(1).map(p.ordinal).radd('_')
print(df)
<小时>
Name Age
0 Tom_1st 20
1 Tom_2nd 21
2 Jack_1st 19
3 Terry_1st 18
这篇关于如何在 Pandas 中用多个唯一字符串替换重复值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!