正则表达式匹配给定集合的任何子集? [英] Regular expression matching any subset of a given set?
问题描述
是否可以编写一个与给定字符集的任何子集匹配的正则表达式?
a1 ... an
?
IE.它应该与这些字符中最多出现一次的任何字符串匹配,没有其他字符,并且字符的相对顺序无关紧要.
Is it possible to write a regular expression which will match any subset of a given set of characters
a1 ... an
?
I.e. it should match any string where any of these characters appears at most once, there are no other characters and the relative order of the characters doesn't matter.
同时出现的一些方法:
1. [a1,...,an]*
或(a1|a2|...|an)*
-这允许多个字符出现
2. (a1?a2?...an?)
-没有多重存在,但是相对顺序很重要-它匹配任何子序列,但不匹配子集.
3. ($|a1|...|an|a1a2|a2a1|...|a1...an|...|an...a1)
,即写所有可能的子序列(只对所有匹配的字符串进行硬编码即可).
Some approaches that arise at once:
1. [a1,...,an]*
or (a1|a2|...|an)*
- this allows multiple presence of characters
2. (a1?a2?...an?)
- no multiple presence, but relative order is important - this matches any subsequence but not subset.
3. ($|a1|...|an|a1a2|a2a1|...|a1...an|...|an...a1)
, i.e. write all possible subsequences (just hardcode all matching strings :)) of course, not acceptable.
我也有一个猜测,从理论上讲这是不可能的,因为在解析字符串时,我们将需要记住我们之前已经遇到过的哪个字符,据我所知,正则表达式只能检出直角语言.
I also have a guess that it may be theoretically impossible, because during parsing the string we will need to remember which character we have already met before, and as far as I know regular expressions can check out only right-linear languages.
任何帮助将不胜感激.预先感谢.
Any help will be appreciated. Thanks in advance.
推荐答案
无法考虑如何使用单个正则表达式,但这是使用n
正则表达式的一种方法:(我将使用usr 2
... m
n
等,用于a
s)
Can't think how to do it with a single regex, but this is one way to do it with n
regexes: (I will usr 1
2
... m
n
etc for your a
s)
^[23..n]*1?[23..n]*$
^[13..n]*2?[13..n]*$
...
^[12..m]*n?[12..m]*$
如果以上所有条件均匹配,则您的字符串是12..mn
的严格子集.
If all the above match, your string is a strict subset of 12..mn
.
这是如何工作的:每行都要求字符串完全由 组成:
How this works: each line requires the string to consist exactly of:
- 从集合中提取的任意数量的字符,除了
a particular one
- 也许
a particular one
- 从集合中提取的任意数量的字符,除了
a particular one
- any number of charactersm drawn fromthe set, except
a particular one
- perhaps
a particular one
- any number of charactersm drawn fromthe set, except
a particular one
如果在依次将每个元素都视为a particular one
时通过了此操作,则我们知道:
If this passes when every element in turn is considered as a particular one
, we know:
- 除了允许的元素外,字符串中没有其他内容
- 每个允许的元素中最多有一个
根据需要.
为了完整起见,我应该说,只有在受到使用正则表达式"的命令的情况下,我才会这样做;如果没有,我将跟踪已看到哪些允许的元素,并遍历字符串中的字符以完成明显的工作.
for completeness I should say that I would only do this if I was under orders to "use regex"; if not, I'd track which allowed elements have been seen, and iterate over the characters of the string doing the obvious thing.
这篇关于正则表达式匹配给定集合的任何子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!