pandas 与正则表达式“"不一致.点元字符? [英] Pandas inconsistency with regex "." dot metacharacter?

查看：71 发布时间：2020/5/24 2:49:24 python regex pandas

本文介绍了 pandas 与正则表达式“"不一致.点元字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑

df

              Cost
Store 1       22.5
Store 1  .........
Store 2        ...

要将这些点转换为nan，我可以使用:

To convert these the dots to nan, I can use:

df.replace('^\.+$', np.nan, regex=True)

         Cost
Store 1  22.5
Store 1   NaN
Store 2   NaN

我不明白为什么以下模式也能起作用:

What I don't understand is why the following pattern also works:

df.replace('^.+$', np.nan, regex=True)

         Cost
Store 1  22.5
Store 1   NaN
Store 2   NaN

请注意，在这种情况下，我没有转义.，因此应将其视为Matchall字符，导致每一行都转换为NaN ...但事实并非如此. .只有....行被匹配... 尽管我使用的是matchall字符.

Note that, in this case, I haven't escaped the ., so it should be treated as a matchall character, resulting in every single row being converted to NaN... but it isn't.... only the .... rows are matched... even though I used the matchall character.

与此进行对比:

import re
re.sub('^.+$', '', '22.5') 
''

将返回一个空字符串.

Which returns an empty string.

那是怎么回事?

推荐答案

通过写这个问题的半途，我意识到了问题所在:

Halfway through writing this question, I realised what the problem was:

df.Cost.dtype
dtype('O')

df.Cost.values
array([22.5, '.........', '...'], dtype=object)

因此，22.5恰好是一个数字值，而正则表达式模式在尝试替换时只是在非字符串值上跳过了 .进行astype转换很明显:

So, the 22.5 happens to be a numeric value, and the regex pattern simply skips over non-string values when attempting to replace. Doing an astype conversion makes it obvious:

df.astype(str).replace('.+', np.nan, regex=True)

         Cost
Store 1   NaN
Store 1   NaN
Store 2   NaN

问题解决了.留着这个，以防其他人对此感到困惑.

Problem solved. Leaving this up in case anyone else is confused by this.

这篇关于 pandas 与正则表达式“"不一致.点元字符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 与正则表达式“"不一致.点元字符? [英] Pandas inconsistency with regex "." dot metacharacter?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 与正则表达式“"不一致.点元字符? [英] Pandas inconsistency with regex &quot;.&quot; dot metacharacter?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

pandas 与正则表达式“"不一致.点元字符? [英] Pandas inconsistency with regex "." dot metacharacter?

登录关闭