大 pandas :带有正则表达式的Dataframe.replace() [英] pandas: Dataframe.replace() with regex
问题描述
我有一个看起来像这样的表:
I have a table which looks like this:
df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-'])))
A B
0 1.00 1.0
1 -1 -45.00
2 NaN -
我想使用dataframe.replace()将'-'替换为'0.00',但是由于负值'-1','-45.00'而很难解决.
I would like to replace '-' to '0.00' using dataframe.replace() but it struggles because of the negative values, '-1', '-45.00'.
如何忽略负值,仅将-"替换为"0.00"?
How can I ignore the negative values and replace only '-' to '0.00' ?
我的代码:
df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64)
错误代码:
ValueError: invalid literal for float(): 0.0045.00
推荐答案
您的正则表达式与所有-
字符都匹配:
Your regex is matching on all -
characters:
In [48]:
df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True)
Out[48]:
A B
0 1.00 1.0
1 0.001 0.0045.00
2 NaN 0.00
如果您设置了其他边界,以使其仅与该字符匹配且带有终止符,则它会按预期工作:
If you put additional boundaries so that it only matches that single character with a termination then it works as expected:
In [47]:
df_raw.replace(['^-$'], ['0.00'], regex=True)
Out[47]:
A B
0 1.00 1.0
1 -1 -45.00
2 NaN 0.00
此处 ^
表示字符串的开始,而 $
表示字符串的结束,因此仅在该单个字符上匹配.
Here ^
means start of string and $
means end of string so it will only match on that single character.
或者您可以只使用 replace
(仅在完全匹配时匹配):
Or you can just use replace
which will only match on exact matches:
In [29]:
df_raw.replace('-',0)
Out[29]:
A B
0 1.00 1.0
1 -1 -45.00
2 NaN 0
这篇关于大 pandas :带有正则表达式的Dataframe.replace()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!