大 pandas :带有正则表达式的Dataframe.replace() [英] pandas: Dataframe.replace() with regex

查看:120
本文介绍了大 pandas :带有正则表达式的Dataframe.replace()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的表:

I have a table which looks like this:

df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-'])))

    A       B
0   1.00    1.0
1   -1      -45.00
2   NaN     -

我想使用dataframe.replace()将'-'替换为'0.00',但是由于负值'-1','-45.00'而很难解决.

I would like to replace '-' to '0.00' using dataframe.replace() but it struggles because of the negative values, '-1', '-45.00'.

如何忽略负值,仅将-"替换为"0.00"?

How can I ignore the negative values and replace only '-' to '0.00' ?

我的代码:

df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64)

错误代码:

ValueError: invalid literal for float(): 0.0045.00

推荐答案

您的正则表达式与所有-字符都匹配:

Your regex is matching on all - characters:

In [48]:
df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True)

Out[48]:
       A          B
0   1.00        1.0
1  0.001  0.0045.00
2    NaN       0.00

如果您设置了其他边界,以使其仅与该字符匹配且带有终止符,则它会按预期工作:

If you put additional boundaries so that it only matches that single character with a termination then it works as expected:

In [47]:
df_raw.replace(['^-$'], ['0.00'], regex=True)

Out[47]:
      A       B
0  1.00     1.0
1    -1  -45.00
2   NaN    0.00

此处 ^ 表示字符串的开始,而 $ 表示字符串的结束,因此仅在该单个字符上匹配.

Here ^ means start of string and $ means end of string so it will only match on that single character.

或者您可以只使用 replace (仅在完全匹配时匹配):

Or you can just use replace which will only match on exact matches:

In [29]:

df_raw.replace('-',0)
Out[29]:
      A       B
0  1.00     1.0
1    -1  -45.00
2   NaN       0

这篇关于大 pandas :带有正则表达式的Dataframe.replace()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆