Python re.sub 使用非贪婪模式 (.?) 以字符串结尾 ($) 它来贪婪！ [英] Python re.sub use non-greedy mode (.?) with end of string ($) it comes greedy!

查看：119 发布时间：2021/7/6 19:44:51 python regex regex-greedy

本文介绍了Python re.sub 使用非贪婪模式 (.*?) 以字符串结尾 ($) 它来贪婪！的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

代码:

str = '<br><br />A<br />B'
print(re.sub(r'<br.*?>\w$', '', str))

本应返回<br><br/>A，但返回一个空字符串''！

It is expected to return <br><br />A, but it returns an empty string ''!

有什么建议吗?

推荐答案

贪婪是从左到右起作用的，但不是相反.它的基本意思是除非匹配失败，否则不匹配".这是发生了什么:

Greediness works from left to right, but not otherwise. It basically means "don't match unless you failed to match". Here's what's going on:

正则表达式引擎匹配字符串开头的 .
.*? 暂时忽略了，很懒.
尝试匹配>，并成功.
尝试匹配 \w 并失败.现在很有趣 - 引擎开始回溯，并看到 .*? 规则.在这种情况下，. 可以匹配第一个 >，所以仍然有希望匹配.
这种情况一直发生，直到正则表达式到达斜线为止.然后 >\w 可以匹配，但 $ 失败.引擎再次回到惰性 .* 规则，并保持匹配，直到匹配 A B



The regex engine matches <br at the start of the string.
.*? is ignored for now, it is lazy.
Try to match >, and succeeds.
Try to match \w and fails. Now it's interesting - the engine starts backtracking, and sees the .*? rule. In this case, . can match the first >, so there's still hope for that match.
This keep happening until the regex reaches the slash. Then >\w can match, but $ fails. Again, the engine comes back to the lazy .* rule, and keeps matching, until it matches<br><br />A<br />B

幸运的是，有一个简单的解决方案:通过替换 <br[^>]*>\w$ 你不会允许在你的标签之外匹配，所以它应该替换最后一次出现.
严格来说，这不适用于 HTML，因为标签属性可以包含 > 字符，但我认为这只是一个示例.
Luckily, there's an easy solution: By replacing <br[^>]*>\w$ you don't allow matching outside of your tags, so it should replace the last occurrence.

Strictly speaking, this doesn't work well for HTML, because tag attributes can contain  > characters, but I assume it's just an example.

                        这篇关于Python re.sub 使用非贪婪模式 (.*?) 以字符串结尾 ($) 它来贪婪！的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Python re.sub 使用非贪婪模式 (.?) 以字符串结尾 ($) 它来贪婪！ [英] Python re.sub use non-greedy mode (.?) with end of string ($) it comes greedy!

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python re.sub 使用非贪婪模式 (.*?) 以字符串结尾 ($) 它来贪婪！ [英] Python re.sub use non-greedy mode (.*?) with end of string ($) it comes greedy!

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python re.sub 使用非贪婪模式 (.?) 以字符串结尾 ($) 它来贪婪！ [英] Python re.sub use non-greedy mode (.?) with end of string ($) it comes greedy!

登录关闭