为什么python的re.search方法挂了? [英] Why does python's re.search method hang?

查看：48 发布时间：2021/7/6 20:29:43 python regex

本文介绍了为什么python的re.search方法挂了?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 python 正则表达式库来解析一些字符串，目前我发现我的正则表达式要么太复杂，要么我搜索的字符串太长.

这是一个挂断的例子:

<预><代码>>>>进口重新>>>reg = "(\w+'?\s*)+[-|~]\s*((\d+\.?\d+\$?)|(\$?\d+\.?\d+))">>>re.search(reg, "**正在寻找 PAYPAL 提供这些不常见的油漆 **") #挂在这里......

我不确定发生了什么.任何帮助表示赞赏！

这是我尝试匹配的示例的链接:Regxr

解决方案

代码执行挂起的原因是灾难性的回溯由于量化组 (\w+'?\s*)+ 中的一个强制模式和 1+ 个可选模式(那些可以匹配空字符串的模式)，允许regex 引擎来测试很多匹配的路径，太多以至于需要很长时间才能完成.

我建议以 ' 或 \s 成为必需的方式解开有问题的组并将它们包装在可选组中:

(\w+(?:['\s]+\w+)*)\s*[-~]\s*(\$?\d+(?:\.\d+)?\$?)^^^^^^^^^^^^^^^^^^^***

查看正则表达式演示

这里，(\w+(?:['\s]+\w+)*) 将匹配 1+ 个单词字符，然后匹配 1+ 个 ' 的 0+ 个序列code> 或空格后跟 1+ 个单词字符.这样，如果出现不匹配的字符串，模式就会变成线性，并且正则表达式引擎会更快地使匹配失败.

模式的其余部分:

\s*[-~]\s* - - 或 ~ 用 0+ 个空格包裹
(\$?\d+(?:\.\d+)?\$?) - 第 2 组捕获\$? - 1 或 0 $ 个符号 \d+ - 1+ 个数字 (?:\.\d+)? - 1 或 0 个零序列:\. - 一个点 \d+ - 1+ 个数字 \$? - 1 或 0 $ 个符号

I'm using python regex library to parse some strings and currently I found that my regex is either too complicated or the string I'm searching is too long.



Here's an example of the hang up:
>>> import re
>>> reg = "(\w+'?\s*)+[-|~]\s*((\d+\.?\d+\$?)|(\$?\d+\.?\d+))"
>>> re.search(reg, "**LOOKING FOR PAYPAL OFFERS ON THESE PAINTED UNCOMMONS**") #Hangs here...
I'm not sure what's going on. Any help appreciated!

EDIT: Here's a link with examples of what I'm trying to match: Regxr
 解决方案 
The reason why the code execution hangs is catastrophic backtracking due to one obligatory and 1+ optional patterns (those that can match an empty string) inside a quantified group (\w+'?\s*)+ that allows a regex engine to test a lot of matching paths, so many that it takes too long to complete.

I suggest unwrapping the problematic group in such a way that ' or \s become obligatory and wrap them in an optional group:
(\w+(?:['\s]+\w+)*)\s*[-~]\s*(\$?\d+(?:\.\d+)?\$?)
^^^^^^^^^^^^^^^^^^^***
See the regex demo

Here, (\w+(?:['\s]+\w+)*) will match 1+ word chars, and then 0+ sequences of 1+ ' or whitespaces followed with 1+ word chars. This way, the pattern becomes linear and the regex engine fails the match quicker if a non-matching string occurs.

The rest of the pattern:


\s*[-~]\s*  - either - or ~ wrapped with 0+ whitespaces
(\$?\d+(?:\.\d+)?\$?) - Group 2 capturing


\$? - 1 or 0 $ symbols
\d+ - 1+ digits
(?:\.\d+)? - 1 or 0 zero sequences of:


\. - a dot
\d+ - 1+ digits

\$? - 1 or 0 $ symbols



                        这篇关于为什么python的re.search方法挂了?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

为什么python的re.search方法挂了? [英] Why does python's re.search method hang?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么python的re.search方法挂了? [英] Why does python&#39;s re.search method hang?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

为什么python的re.search方法挂了? [英] Why does python's re.search method hang?

登录关闭