匹配两个或多个不相同的字符 [英] matching two or more characters that are not the same

查看：41 发布时间：2021/7/7 18:34:52 regex regex-negation

本文介绍了匹配两个或多个不相同的字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否可以编写一个正则表达式模式来匹配 abc，其中每个字母都不是文字，而是表示像 xyz(但不是 xxy>) 会匹配吗?我能够在 (.)(?!\1) 中匹配 a 中的 (.)(?!\1) 但后来我被难住了.

Is it possible to write a regex pattern to match abc where each letter is not literal but means that text like xyz (but not xxy) would be matched? I am able to get as far as (.)(?!\1) to match a in ab but then I am stumped.

在得到下面的答案后，我能够编写一个例程来生成这种模式.使用原始 re 模式比将模式和文本都转换为规范形式然后将它们合并要快得多.

After getting the answer below, I was able to write a routine to generate this pattern. Using raw re patterns is much faster than converting both the pattern and a text to canonical form and then comaring them.

def pat2re(p, know=None, wild=None):
    """return a compiled re pattern that will find pattern `p`
    in which each different character should find a different
    character in a string. Characters to be taken literally
    or that can represent any character should be given as
    `know` and `wild`, respectively.

    EXAMPLES
    ========

    Characters in the pattern denote different characters to
    be matched; characters that are the same in the pattern
    must be the same in the text:

    >>> pat = pat2re('abba')
    >>> assert pat.search('maccaw')
    >>> assert not pat.search('busses')

    The underlying pattern of the re object can be seen
    with the pattern property:

    >>> pat.pattern
    '(.)(?!\\1)(.)\\2\\1'    

    If some characters are to be taken literally, list them
    as known; do the same if some characters can stand for
    any character (i.e. are wildcards):

    >>> a_ = pat2re('ab', know='a')
    >>> assert a_.search('ad') and not a_.search('bc')

    >>> ab_ = pat2re('ab*', know='ab', wild='*')
    >>> assert ab_.search('abc') and ab_.search('abd')
    >>> assert not ab_.search('bad')

    """
    import re
    # make a canonical "hash" of the pattern
    # with ints representing pattern elements that
    # must be unique and strings for wild or known
    # values
    m = {}
    j = 1
    know = know or ''
    wild = wild or ''
    for c in p:
        if c in know:
            m[c] = '\.' if c == '.' else c
        elif c in wild:
            m[c] = '.'
        elif c not in m:
            m[c] = j
            j += 1
            assert j < 100
    h = tuple(m[i] for i in p)
    # build pattern
    out = []
    last = 0
    for i in h:
        if type(i) is int:
            if i <= last:
                out.append(r'\%s' % i)
            else:
                if last:
                    ors = '|'.join(r'\%s' % i for i in range(1, last + 1))
                    out.append('(?!%s)(.)' % ors)
                else:
                    out.append('(.)')
                last = i
        else:
            out.append(i)
    return re.compile(''.join(out))

演示

这里是正则表达式模式的解释:

Here is an explanation of the regex pattern:

^          from the start of the string
(.)        match and capture any first character (no restrictions so far)
(?!\1)     then assert that the second character is different from the first
(.)        match and capture any (legitimate) second character
(?!\1|\2)  then assert that the third character does not match first or second
.          match any valid third character
$          end of string

这篇关于匹配两个或多个不相同的字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

匹配两个或多个不相同的字符 [英] matching two or more characters that are not the same

问题描述

推荐答案

演示

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

匹配两个或多个不相同的字符 [英] matching two or more characters that are not the same

问题描述

推荐答案

演示

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭