正则表达式 - 检查输入是否仍有机会匹配 [英] Regex - check if input still has chances to become matching

查看:137
本文介绍了正则表达式 - 检查输入是否仍有机会匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有这样的正则表达式:

We've got such regexp:

var regexp = /^one (two)+ three/;

所以只有像这样的字符串一二三一二三四one twotwo three等将与之匹配。

So only string like "one two three" or "one two three four" or "one twotwo three" etc. will match it.

但是,如果我们有字符串

However, if we've got string like

one - 仍然'有希望'可能很快会匹配

"one " - is still 'promising' that maybe soon it will match

但是这个字符串:
无论我们做什么,三个都不会匹配。

but this string: "one three" will never match no matter what we'll do.

有没有办法检查给定字符串是否有机会匹配或者没有?

Is there some way to check if given string have chances to become matching or not?

我想在写作期间需要一些提示我想要推荐所有以给定输入开头的选项(正在使用的regexp很长而且我不喜欢我真的想弄乱他们。)

I need it for some tips during writing when I want to recommend all options that begins with given input (regexp's I'm using are pretty long and I dont want really to mess with them).

换句话说 - 我想检查字符串是否在检查过程中结束而没有面对'不匹配'。

In other words - I want to check if string has ended during checking and nothing 'not matching' was faced.

更多其他的话 - 答案将在内部不匹配的原因。如果原因是字符串结束 - 那么它就会失败。但是我不知道有什么方法可以检查为什么某些字符串不匹配

In even more other words - Answer would be inside reason of not matching. If reason is end of string - then it would be promissing. However I dont know any way to check why some string didnt match

推荐答案

这是一个名为部分匹配的正则表达式功能,它可以在几个正则表达式引擎中使用,例如PCRE,Boost,Java,但不是在JavaScript中。

This is a regex feature known as partial matching, it's available in several regex engines such as PCRE, Boost, Java but not in JavaScript.

安德烈的回答显示了克服此限制的一种非常好的方法,我们只需要自动执行此操作。

Andacious's answer shows a very nice way to overcome this limitation, we just need to automate this.

嗯...接受了挑战:)

Well... challenge accepted :)

幸运的是,JavaScript的正则表达式功能集非常有限,语法简单,所以我编写了一个简单的解析器并且 - 根据在MDN上列出的功能此任务的飞行转换

Fortunately, JavaScript has a very limited regex feature set with a simple syntax, so I wrote a simple parser and on-the-fly transformation for this task, based on the features listed on MDN.

几个兴趣点:


  • 这会产生一个正则表达式几乎总是匹配空字符串。因此,当 exec 的结果是 null 或第一个元素为空字符串的数组时,会发生失败的部分匹配

  • 负面预测保持原样。我认为这是正确的做法。失败的唯一方法是通过它们(即在正则表达式中放置(?!))和锚点( ^ $ )。

  • 解析器采用有效的输入模式:您无法创建 RegExp 首先是无效模式中的对象。

  • 此代码无法正确处理反向引用: ^(\ w +) \ + + \ $ $ 不会产生部分匹配 hello hel 例如

  • This produces a regex which will almost always match the empty string. Therefore a failed partial match occurs when the result of exec is null or an array whose first element is the empty string
  • Negative lookaheads are kept as-is. I believe that's the right thing to do. The only ways to fail a match is through them (ie put a (?!) in the regex) and anchors (^ and $).
  • The parser assumes a valid input pattern: you can't create a RegExp object from an invalid pattern in the first place.
  • This code won't handle backreferences properly: ^(\w+)\s+\1$ won't yield a partial match against hello hel for instance

RegExp.prototype.toPartialMatchRegex = function() {
    "use strict";
    
    var re = this,
        source = this.source,
        i = 0;
    
    function process () {
        var result = "",
            tmp;

        function appendRaw(nbChars) {
            result += source.substr(i, nbChars);
            i += nbChars;
        };
        
        function appendOptional(nbChars) {
            result += "(?:" + source.substr(i, nbChars) + "|$)";
            i += nbChars;
        };

        while (i < source.length) {
            switch (source[i])
            {
                case "\\":
                    switch (source[i + 1])
                    {
                        case "c":
                            appendOptional(3);
                            break;
                            
                        case "x":
                            appendOptional(4);
                            break;
                            
                        case "u":
                            if (re.unicode) {
                                if (source[i + 2] === "{") {
                                    appendOptional(source.indexOf("}", i) - i + 1);
                                } else {
                                    appendOptional(6);
                                }
                            } else {
                                appendOptional(2);
                            }
                            break;
                            
                        default:
                            appendOptional(2);
                            break;
                    }
                    break;
                    
                case "[":
                    tmp = /\[(?:\\.|.)*?\]/g;
                    tmp.lastIndex = i;
                    tmp = tmp.exec(source);
                    appendOptional(tmp[0].length);
                    break;
                    
                case "|":
                case "^":
                case "$":
                case "*":
                case "+":
                case "?":
                    appendRaw(1);
                    break;
                    
                case "{":
                    tmp = /\{\d+,?\d*\}/g;
                    tmp.lastIndex = i;
                    tmp = tmp.exec(source);
                    if (tmp) {
                        appendRaw(tmp[0].length);
                    } else {
                        appendOptional(1);
                    }
                    break;
                    
                case "(":
                    if (source[i + 1] == "?") {
                        switch (source[i + 2])
                        {
                            case ":":
                                result += "(?:";
                                i += 3;
                                result += process() + "|$)";
                                break;
                                
                            case "=":
                                result += "(?=";
                                i += 3;
                                result += process() + ")";
                                break;
                                
                            case "!":
                                tmp = i;
                                i += 3;
                                process();
                                result += source.substr(tmp, i - tmp);
                                break;
                        }
                    } else {
                        appendRaw(1);
                        result += process() + "|$)";
                    }
                    break;
                    
                case ")":
                    ++i;
                    return result;
                    
                default:
                    appendOptional(1);
                    break;
            }
        }
        
        return result;
    }
    
    return new RegExp(process(), this.flags);
};






// Test code
(function() {
    document.write('<span style="display: inline-block; width: 60px;">Regex: </span><input id="re" value="^one (two)+ three"/><br><span style="display: inline-block; width: 60px;">Input: </span><input id="txt" value="one twotw"/><br><pre id="result"></pre>');
    document.close();

    var run = function() {
        var output = document.getElementById("result");
        try
        {
            var regex = new RegExp(document.getElementById("re").value);
            var input = document.getElementById("txt").value;
            var partialMatchRegex = regex.toPartialMatchRegex();
            var result = partialMatchRegex.exec(input);
            var matchType = regex.exec(input) ? "Full match" : result && result[0] ? "Partial match" : "No match";
            output.innerText = partialMatchRegex + "\n\n" + matchType + "\n" + JSON.stringify(result);
        }
        catch (e)
        {
            output.innerText = e;
        }
    };

    document.getElementById("re").addEventListener("input", run);
    document.getElementById("txt").addEventListener("input", run);
    run();
}());

我测试过它一点点它似乎工作正常,如果你发现任何错误,请告诉我。

I tested it a little bit and it seems to work fine, let me know if you find any bugs.

这篇关于正则表达式 - 检查输入是否仍有机会匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆