是否有任何用 Java 编写的正则表达式优化器? [英] Is there any Regex optimizer written in Java?

查看:39
本文介绍了是否有任何用 Java 编写的正则表达式优化器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个 Java 程序,它可以生成一系列符号,例如 "abcdbcdefbcdbcdefg".我需要的是正则表达式优化器,它可以产生 "a((bcd){2}ef){2}g".

I wrote a Java program which can generate a sequence of symbols, like "abcdbcdefbcdbcdefg". What I need is Regex optimizer, which can result "a((bcd){2}ef){2}g".

由于输入可能包含 unicodes,例如 "a\u0063\u0063\bbd",我更喜欢 Java 版本.

As the input may contain unicodes, like "a\u0063\u0063\bbd", I prefer a Java version.

我想要一个更短"的表达式的原因是为了节省空间/内存.这里的符号序列可能很长.

The reason I want to get a "shorter" expression is for saving space/memory. The sequence of symbols here could be very long.

一般来说,要找到最短"的优化正则表达式是很困难的.所以,在这里,我不需要保证最短"标准的那些.

In general, to find the "shortest" optimized regex is hard. So, here, I don't need ones that guarantee the "shortest" criteria.

推荐答案

我有一种讨厌的感觉,即创建与给定输入字符串或一组字符串匹配的最短正则表达式的问题在计算上将是困难的".(与计算 Kolmogorov Complexity 的问题有相似之处...)

I've got a nasty feeling that the problem of creating the shortest regex that matches a given input string or set of strings is going to be computationally "difficult". (There are parallels with the problem of computing Kolmogorov Complexity ...)

还值得注意的是,abcdbcdefbcdbcdefg 在匹配速度方面的最佳正则表达式可能是 abcdbcdefbcdbcdefg.添加重复组可能会使正则表达式字符串更短,但不会使正则表达式更快.事实上,除非正则表达式引擎展开重复组,否则它可能会更慢.

It is also worth noting that the optimal regex for abcdbcdefbcdbcdefg in terms of matching speed is likely to be abcdbcdefbcdbcdefg. Adding repeating groups may make the regex string shorter, but it won't make the regex faster. In fact, it is likely to be slower unless the regex engine unrolls the repeating groups.

我需要它的原因是空间/内存限制.

The reason that I need this is due to the space/memory limits.

您是否有明确的证据表明您需要这样做?

Do you have clear evidence that you need to do this?

我怀疑这样做不会节省大量空间……除非输入字符串很长.(如果它们很长,那么使用常规文本压缩算法来压缩字符串会得到更好的结果.)

I suspect that you won't save a worthwhile amount of space by doing this ... unless the input strings are really long. (And if they are long, then you'll get better results using a regular text compression algorithm to compress the strings.)

这篇关于是否有任何用 Java 编写的正则表达式优化器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆