在 pandas 中的列之间替换重复的值 [英] Replace duplicate values across columns in Pandas
问题描述
我有一个简单的数据框:
I have a simple dataframe as such:
df = [ {'col1' : 'A', 'col2': 'B', 'col3': 'C', 'col4':'0'},
{'col1' : 'M', 'col2': '0', 'col3': 'M', 'col4':'0'},
{'col1' : 'B', 'col2': 'B', 'col3': '0', 'col4':'B'},
{'col1' : 'X', 'col2': '0', 'col3': 'Y', 'col4':'0'}
]
df = pd.DataFrame(df)
df = df[['col1', 'col2', 'col3', 'col4']]
df
看起来像这样:
| col1 | col2 | col3 | col4 |
|------|------|------|------|
| A | B | C | 0 |
| M | 0 | M | 0 |
| B | B | 0 | B |
| X | 0 | Y | 0 |
我只想在行之间用字符"0"替换重复的字符.归结为保留我们遇到的第一个重复值,如下所示:
I just want to replace repeated characters with the character '0', across the rows. It boils down to keeping the first duplicate value we come across, as like this:
| col1 | col2 | col3 | col4 |
|------|------|------|------|
| A | B | C | 0 |
| M | 0 | 0 | 0 |
| B | 0 | 0 | 0 |
| X | 0 | Y | 0 |
这似乎很简单,但是我被卡住了.任何朝着正确方向前进的人都将不胜感激.
This seems so simple but I'm stuck. Any nudges in the right direction would be really appreciated.
推荐答案
您可以使用duplicated
方法返回一个布尔索引器,该索引器确定元素是否重复:
You can use the duplicated
method to return a boolean indexer of whether elements are duplicates or not:
In [214]: pd.Series(['M', '0', 'M', '0']).duplicated()
Out[214]:
0 False
1 False
2 True
3 True
dtype: bool
然后,您可以通过在数据框的各行之间映射此遮罩并使用where
进行替换来创建遮罩:
Then you could create a mask by mapping this across the rows of your dataframe, and using where
to perform your substitution:
is_duplicate = df.apply(pd.Series.duplicated, axis=1)
df.where(~is_duplicate, 0)
col1 col2 col3 col4
0 A B C 0
1 M 0 0 0
2 B 0 0 0
3 X 0 Y 0
这篇关于在 pandas 中的列之间替换重复的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!