匹配正则表达式的随机字符串 [英] Random string that matches a regexp

查看:124
本文介绍了匹配正则表达式的随机字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您将如何创建与某个正则表达式匹配的随机字母数字字符串?

How would you go about creating a random alpha-numeric string that matches a certain regular expression?

这是专门用于创建满足常规密码要求的初始密码.

This is specifically for creating initial passwords that fulfill regular password requirements.

推荐答案

Welp,只是沉思而已,但是生成与regex匹配的随机输入的一般问题对于我来说对于足够随机的定义和足够严格的定义听起来是可行的正则表达式我在考虑经典的形式定义,该定义只允许()| *和字母字符.

Welp, just musing, but the general question of generating random inputs that match a regex sounds doable to me for a sufficiently relaxed definition of random and a sufficiently tight definition of regex. I'm thinking of the classical formal definition, which allows only ()|* and alphabet characters.

正则表达式可以映射到称为有限自动机的形式机.这样的机器是有向图,具有一个称为最终状态的特定节点,一个称为初始状态的节点,以及每个边缘上的字母中的一个字母.如果可以从初始状态开始并遍历图形中用每个字符标记的一条边并在最终状态结束,则正则表达式会接受一个单词.

Regular expressions can be mapped to formal machines called finite automata. Such a machine is a directed graph with a particular node called the final state, a node called the initial state, and a letter from the alphabet on each edge. A word is accepted by the regex if it's possible to start at the initial state and traverse one edge labeled with each character through the graph and end at the final state.

可以构建图形,然后从最终状态开始并向后遍历随机边,从而跟踪路径.在标准构造中,图中的每个节点都可以从初始状态到达,因此您不必担心会犯不可恢复的错误并需要回溯.如果达到初始状态,请停止并读取前进的路径.这就是您要使用的正则表达式.

One could build the graph, then start at the final state and traverse random edges backwards, keeping track of the path. In a standard construction, every node in the graph is reachable from the initial state, so you do not need to worry about making irrecoverable mistakes and needing to backtrack. If you reach the initial state, stop, and read off the path going forward. That's your match for the regex.

但是,对于何时或是否达到初始状态,没有特别的保证.必须首先弄清所生成的字符串是随机"的含义,以及您首先希望从该语言中获取随机元素的含义.

There's no particular guarantee about when or if you'll reach the initial state, though. One would have to figure out in what sense the generated strings are 'random', and in what sense you are hoping for a random element from the language in the first place.

不过,也许这是思考问题的起点!

Maybe that's a starting point for thinking about the problem, though!

现在我已经写了出来,在我看来,反复解决各种选择以简化正则表达式模式可能更简单,直到剩下一个简单的字符串为止.查找模式中的第一个非字母字符.如果是*,请复制前几项,然后删除*.如果是|,则选择保留或删除其余项目.对于左括号,请执行相同的操作,但要查看匹配的右括号后面的字符.如果先将正则表达式解析为树表示形式,这样会使paren分组结构更易于使用,这可能会更容易.

Now that I've written that out, it seems to me that it might be simpler to repeatedly resolve choices to simplify the regex pattern until you're left with a simple string. Find the first non-alphabet character in the pattern. If it's a *, replicate the preceding item some number of times and remove the *. If it's a |, choose which of the OR'd items to preserve and remove the rest. For a left paren, do the same, but looking at the character following the matching right paren. This is probably easier if you parse the regex into a tree representation first that makes the paren grouping structure easier to work with.

对于那些担心确定正则表达式是否真正匹配任何东西的人来说,这等同于暂停问题:不,常规语言的行为举止相当好.您可以判断是否有两个正则表达式描述了同一组接受的字符串.您基本上是在上面制造机器,然后遵循一种算法来产生一个规范的最小等效机器.对两个正则表达式执行此操作,然后检查生成的最小机器数量是否相等,这很简单.

To the person who worried that deciding if a regex actually matches anything is equivalent to the halting problem: Nope, regular languages are quite well behaved. You can tell if any two regexes describe the same set of accepted strings. You basically make the machine above, then follow an algorithm to produce a canonical minimal equivalent machine. Do that for two regexes, then check if the resulting minimal machines are equivalent, which is straightforward.

这篇关于匹配正则表达式的随机字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆