Python Pandas替换特殊字符 [英] Python Pandas Replace Special Character
问题描述
由于某些原因,我无法在ñ
上使用此简单语句.它似乎可以在其他任何东西上工作,但不喜欢那个角色.有什么想法吗?
For some reason, I cannot get this simple statement to work on the ñ
. It seems to work on anything else but doesn't like that character. Any ideas?
DF['NAME']=DF['NAME'].str.replace("ñ","n")
谢谢
推荐答案
我假设您在此处使用的是Python 2.x,这很可能是Unicode问题.不用担心,您并不孤单-unicode通常非常困难,尤其是在Python 2中,这就是为什么它已在Python 3中成为标准的原因.
I'm assuming you're using Python 2.x here and this is likely a Unicode problem. Don't worry, you're not alone--unicode is really tough in general and especially in Python 2, which is why it's been made standard in Python 3.
如果您只关心ñ
,则应使用UTF-8解码,然后只需替换一个字符即可.
If all you're concerned about is the ñ
, you should decode in UTF-8, and then just replace the one character.
这看起来类似于以下内容:
That would look something like the following:
DF['name'] = DF['name'].str.decode('utf-8').replace(u'\xf1', 'n')
例如:
>>> "sureño".decode("utf-8").replace(u"\xf1", "n")
u'sureno'
如果您的字符串已经是Unicode,那么您可以(实际上必须)跳过decode
步骤:
If your string is already Unicode, then you can (and actually have to) skip the decode
step:
>>> u"sureño".replace(u"\xf1", "n")
u'sureno'
请注意,u'\xf1'
使用十六进制转义表示有问题的角色.
Note here that u'\xf1'
uses the hex escape for the character in question.
我在评论中被告知<>.str.replace
是熊猫系列方法,但我没有意识到.答案可能类似于以下内容:
I was informed in the comments that <>.str.replace
is a pandas series method, which I hadn't realized. The answer to this possibly might be something like the following:
DF['name'] = map(lambda x: x.decode('utf-8').replace(u'\xf1', 'n'), DF['name'].str)
或类似的东西(如果该熊猫对象是可迭代的).
or something along those lines, if that pandas object is iterable.
实际上,我突然发现您的问题可能很简单,如下所示:
It actually just occurred to me that your issue may be as simple as the following:
DF['NAME']=DF['NAME'].str.replace(u"ñ","n")
请注意我是如何在字符串前面添加u
使其成为unicode的.
Note how I've added the u
in front of the string to make it unicode.
这篇关于Python Pandas替换特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!