Python-根据另一个变量重命名重复的值 [英] Python- Renaming duplicated values based on another variable

查看:47
本文介绍了Python-根据另一个变量重命名重复的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无论如何,是否可以基于另一个变量来重命名值?在这里,我有两列,其中一列是ID,另一列是水果.但是,我在想是否可以根据ID唯一标识它们

Is there anyway to rename the values based on another variable? Over here I have two columns, one of which is ID and another is fruits. However, I was thinking would it be possible to uniquely identify them based on the ID

ID  Fruits
1    Apple
1   Banana
1   Orange
1   Banana
2    Apple
2   Orange
2   Orange
3    Apple
3    Apple
3   Orange

希望实现这样的目标

ID  Fruits
1    Apple
1   Banana
1   Orange
1  Banana1
2    Apple
2   Orange
2  Orange1
3    Apple
3   Apple1
3   Orange

推荐答案

设置

Setup

df = pd.DataFrame({
    'id': [1,1,1,1,2,2,2,3,3,3],
    'fruit': ['Apple', 'Banana', 'Orange', 'Banana', 'Apple', 'Orange', 'Orange', 'Apple', 'Apple', 'Orange']
})

选项1
cumcount replace 和字符串连接(我使用的正则表达式模式仅匹配单个零,因此此答案可以还支持每组9个重复项):

Option 1
cumcount with replace and string concatenation (I use a regex pattern that only matches a single zero so this answer can also support more than 9 duplicates per group):

df['fruit'] = df.fruit + df.groupby(
    ['id', 'fruit']).cumcount().astype(str).replace(
    r'^0$', '', regex=True
)

选项2
存储groupby并通过 fillna 使用布尔索引(我个人更喜欢这种方法)

Option 2
Store the groupby and use boolean indexing with fillna (I personally prefer this approach)

s = df.groupby(['id', 'fruit']).cumcount()
df['fruit'] = (df.fruit + s[s>0].astype(str)).fillna(df.fruit)

两者均导致:

   id    fruit
0   1    Apple
1   1   Banana
2   1   Orange
3   1  Banana1
4   2    Apple
5   2   Orange
6   2  Orange1
7   3    Apple
8   3   Apple1
9   3   Orange

这篇关于Python-根据另一个变量重命名重复的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆