pandas 使用.replace()交换值的怪异行为 [英] Pandas weird behavior using .replace() to swap values
问题描述
当使用Pandas replace
函数交换列的两个值时,我偶然发现了一个奇怪且不一致的行为.当使用它来交换列中的整数时,我们有
I stumbled upon a weird and inconsistent behavior for Pandas replace
function when using it to swap two values of a column. When using it to swap integers in a column we have
df = pd.DataFrame({'A': [0, 1]})
df.A.replace({0: 1, 1: 0})
这将产生结果:
df
A
1
0
但是,当对字符串值使用相同的命令时
However, when using the same commands for string values
df = pd.DataFrame({'B': ['a', 'b']})
df.B.replace({'a': 'b', 'b': 'a'})
我们得到
df
B
'a'
'a'
任何人都可以向我解释这种行为差异,或者将我指向在熊猫中使用整数和字符串时文档中处理不一致的页面吗?
Can anyone explain me this difference in behavior, or point me to a page in the docs that deals with inconsistencies when using integers and strings in pandas?
推荐答案
Yup, this is definitely a bug, so I've opened a new issue - GH20656.
熊猫似乎相继应用了替换项.它先进行替换,将"a"替换为"b",然后进行第二次替换,将两个"b"替换为"a".
It looks like pandas applies the replacements successively. It makes first replacement, causing "a" to be replaced with "b", and then the second, causing both "b"s to be replaced by "a".
总而言之,您所看到的等同于
In summary, what you see is equivalent to
df.B.replace('a', 'b').replace('b', 'a')
0 a
1 a
Name: B, dtype: object
绝对不是应该发生的事情.
Which is definitely not what should be happening.
有一种解决方法,将str.replace
与lambda
回调一起使用.
There is a workaround using str.replace
with a lambda
callback.
m = {'a': 'b', 'b': 'a'}
df.B.str.replace('|'.join(m.keys()), lambda x: m[x.group()])
0 b
1 a
Name: B, dtype: object
这篇关于 pandas 使用.replace()交换值的怪异行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!