普通EX pression发生器/减速? [英] Regular expression generator/reducer?

查看:141
本文介绍了普通EX pression发生器/减速?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我提出一个有趣的问题,从一个同事的业务痛点,我们现在有,而且很好奇,如果有什么事,在那里(工具/库/算法),这可能有助于自动化这个。

I was posed an interesting question from a colleague for an operational pain point we currently have, and am curious if there's anything out there (utility/library/algorithm) that might help automate this.

假设你有文字值的列表(在我们的情况下,它们的URL)。我们想要做的是,根据这份榜单上,拿出一个单一的正则表达式匹配所有这些文字的项目。

Say you have a list of literal values (in our cases, they are URLs). What we want to do is, based on this list, come up with a single regex that matches all of those literal items.

所以,如果我的名单是:

So, if my list is:

http://www.abc.com
http://www.abc.com/subdir
http://foo.abc.com

最简单的答案是

The simplest answer is

^(http://www.abc.com|http://www.abc.com/subdir|http://foo.abc.com)$

但这种变大了大量的数据,我们有一个长度的限制,我们正在努力留在。

but this gets large for lots of data, and we have a length limit we're trying to stay under.

目前,我们手工编写的正则表达式,但是这并不规模非常好,也不是一个伟大的利用任何人的时间。有分解源数据的更自动化的方式来了一个长度最佳的正则表达式匹配所有的源值吗?

Currently we manually write the regexes but this doesn't scale very well nor is it a great use of anyone's time. Is there a more automated way of decomposing the source data to come up with a length-optimal regex that matches all of the source values?

推荐答案

阿霍Corasick 匹配算法构建了一个有限自动机来匹配多个字符串。你可以在自动机转换为其对应的正则表达式,但它是简单直接使用自动机(这是该算法一样。)

The Aho-Corasick matching algorithm constructs a finite automaton to match multiple strings. You could convert the automaton to its equivalent regex but it is simpler to use the automaton directly (this is what the algorithm does.)

这篇关于普通EX pression发生器/减速?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆