最小化正则表达式 [英] Minimization of the regex
问题描述
我对编程世界还很陌生.我正在尝试创建一个通用的正则表达式,它只匹配给定的字符串列表,仅此而已.
I am fairly new to Programming world. I am trying to create a common regex that would match only list of strings given, nothing more than that.
例如,给定以下列表
List = ['starguide,'snoreguide','snoraguide','smarguides']
它应该创建一个这样的正则表达式 - s(((tar|nor(e|a))(guide))|marguides)
It should create a regex like this - s(((tar|nor(e|a))(guide))|marguides)
我实现了一个尝试.只能设法获得 s(marguides|nor(aguide|eguide)|targuide)
I implemented a trie. Could only manage to get s(marguides|nor(aguide|eguide)|targuide)
我希望我的正则表达式被缩短(常见的后缀绑定在一起).有没有更好的方法来缩短我从特里得到的正则表达式?
I want my regex to be shortened (common suffixes tied together). Is there any better way to shorten the regex I am getting from the trie?
推荐答案
要获得所需的结果,请尝试使用自动机最小化.
To get the desired result try use automata minimization.
对于您的简单示例,确定性自动机就足够了.
For your simple example, deterministic automaton suffices.
使用 github.com/siddharthasahu/automata-from-regex 从平凡的正则表达式(单词的枚举)构建最小确定性状态机/自动机,然后将自动机转换为正则表达式(无环自动机很容易,http://www-igm.univ-mlv.fr/~dr/thdr/www.dcc.fc.up.pt/~nam/publica/extAbsCIAA05.pdf) 另见 https://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions
Use github.com/siddharthasahu/automata-from-regex to build min deterministic state machine/automaton from trivial regex (enumeration of words), then transform automaton into regex (it is easy for acyclic automata, http://www-igm.univ-mlv.fr/~dr/thdr/ www.dcc.fc.up.pt/~nam/publica/extAbsCIAA05.pdf) see also https://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions
在一般情况下,非确定性自动机可以产生更短的正则表达式,但这是一个难题https://cstheory.stackexchange.com/questions/31630/how-can-one-actually-minimize-a-regular-expression
In general case, non-determinist automata could yield shorter regex, yet it is a hard problem https://cstheory.stackexchange.com/questions/31630/how-can-one-actually-minimize-a-regular-expression
这篇关于最小化正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!