pandas 更改重复行的特定列值 [英] pandas change a specific column value of duplicate rows
问题描述
在此处使用示例将所有重复的行拖放到Python Pandas中
让我们说我不想删除重复项,而是更改子集中列之一中的数据值.
因此,根据示例,如果我们使用subset = ['A','C']来标识重复项,那么我想将第1行的列'A'从foo更改为foo1.
我执行此操作的方法很复杂,但必须有一种更简单的方法来利用矢量化/内置功能.</p>
原始df:
A B C
0 foo 0 A
1 foo 1 A
2 foo 1 B
3 bar 1 A
所需的df:
A B C
0 foo 0 A
1 foo1 1 A
2 foo 1 B
3 bar 1 A
您可以使用cumcount
并执行类似的操作
>>> c = df.groupby(["A","C"]).cumcount()
>>> c = c.replace(0, '').astype(str)
>>> df["A"] += c
>>> df
A B C
0 foo 0 A
1 foo1 1 A
2 foo 1 B
3 bar 1 A
之所以可行,是因为cumcount
给了我们
>>> df.groupby(["A","C"]).cumcount()
0 0
1 1
2 0
3 0
dtype: int64
Using the example here Drop all duplicate rows in Python Pandas
Lets say I don't want to drop the duplicates but change the value of the data in one of the columns in the subset.
So as per the example, if we use subset=['A','C'] to identify duplicates then I want to change row 1 column 'A' from foo to foo1.
I have a complicated way of doing this but there must be a more simple way that takes advantage of vectorization/built-in features.
Original df:
A B C
0 foo 0 A
1 foo 1 A
2 foo 1 B
3 bar 1 A
Desired df:
A B C
0 foo 0 A
1 foo1 1 A
2 foo 1 B
3 bar 1 A
You could use cumcount
and do something like
>>> c = df.groupby(["A","C"]).cumcount()
>>> c = c.replace(0, '').astype(str)
>>> df["A"] += c
>>> df
A B C
0 foo 0 A
1 foo1 1 A
2 foo 1 B
3 bar 1 A
This works because the cumcount
gives us
>>> df.groupby(["A","C"]).cumcount()
0 0
1 1
2 0
3 0
dtype: int64
这篇关于 pandas 更改重复行的特定列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!