正则表达式的计算复杂度 [英] Computational complexity for regular expressions
问题描述
正则表达式很快变得太复杂(对我而言)以至于无法理解.甚至像 [ab] [cd]
这样简单的东西,也有几个逻辑分支.我的目标是改善代码库的可维护性,因此对这些问题的答案可以帮助我们检测和修复复杂的代码:
Regular expressions rapidly become too complex (for me) to understand. Even something so simple as [ab][cd]
, has several logical branches. My goal is to improve the maintainability of our code base, so answers to these questions could help us detect and fix complex code:
- 是否存在计算复杂性指标(类似于圈复杂度),其中包括正则表达式固有的复杂性?
- 有没有工具产生正则表达式的复杂度数?
- 是否有工具可以建议对正则表达式进行简化?
推荐答案
您可以尝试使用正则表达式的编译形式,并尝试将一些代码复杂性指标映射到该代码,例如代码行或循环复杂性.要了解我的意思,请查看以下stackoverflow答案: https://stackoverflow.com/a/2348725/5747415 ,它显示了如何使用perl可以访问正则表达式的编译形式.此处显示了另一个示例: http://perldoc.perl.org/perldebguts.html#Debugging-Regular-Expressions ,引用该页面上的工具输出:
You might try using the compiled form of the regexp and try mapping some code complexity metrics to that, like, lines of code, or cyclomatic complexity. To see what I mean, look at the following stackoverflow answer: https://stackoverflow.com/a/2348725/5747415, which shows how with perl you can access the compiled form of a regexp. Another example is shown here: http://perldoc.perl.org/perldebguts.html#Debugging-Regular-Expressions, quoting the tool output from that page:
Compiling REx '[bc]d(ef*g)+h[ij]k$'
1: ANYOF[bc](12)
12: EXACT <d>(14)
14: CURLYX[0] {1,32767}(28)
16: OPEN1(18)
18: EXACT <e>(20)
20: STAR(23)
21: EXACT <f>(0)
23: EXACT <g>(25)
25: CLOSE1(27)
27: WHILEM[1/1](0)
28: NOTHING(29)
29: EXACT <h>(31)
31: ANYOF[ij](42)
42: EXACT <k>(44)
44: EOL(45)
45: END(0)
顺便说一句,祝贺您提高代码可维护性的决定.就是说,我只需要表达怀疑,即任何正式的指标都提供了比(甚至可以接近)经验丰富的开发人员的判断更好的指导.
Btw., I congratulate you to the decision to improve the code maintainability. That said, I just have to express my doubts that any formal metric provides a better guidance than (or can even get close to) the judgement of experienced developers...
这篇关于正则表达式的计算复杂度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!