使用通配符检查文件名搜索模式中的冲突 [英] Checking collision in filename search patterns with wildcards

查看:91
本文介绍了使用通配符检查文件名搜索模式中的冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要通过仅检查/比较表达式来比较文件系统通配符表达式,以查看其结果是否重叠。

I need to compare file system wildcard expressions to see whether their results would overlap, by only examining/comparing the expressions.

为了举例说明,我们正在构建一种实用程序,可以根据文件系统通配符表达式将文件从一个(或多个位置)排序到一个单独的文件夹中。例如:* .txt进入文件夹a,*。doc进入文件夹b,依此类推。我们希望支持的通配符为*和?

For sake of example, we are building a utility that would sort files from one (or more locations) into a separate folders based on file system wildcard expressions. For example: *.txt goes into folder a, *.doc goes into folder b, and so on. The wildcard characters we would support would be * and ?

我希望能够通过分析通配符表达式来确定它们是否会冲突/重叠。

I want to be able to determine from just analyzing the wildcard expressions, whether they would conflict/overlap.

例如,如果我有以下表达式:

For example, if I have the following expressions:


*.x.y
*.y

它们会发生冲突(重叠),因为第二个表达式* .y将包含* .xy结果。 (例如,Axy会匹配这两个表达式)

They would conflict (overlap) because the second expression *.y would include *.x.y results. (ex. A.x.y would match both expressions)

我正在通过使用所有表达式来构建树结构来解决这个问题,并认为构建树的实际行为如果表达式冲突,将失败。

I am approaching this by building a tree structure using the using all of the expressions, figuring that the very act of building a tree will fail if the expressions conflict.


For example:
*.x
a.b
a.c
b.d

might create a tree like 

         +-*-.-x
         |
start +--+
         |     +-b
         |     |
         +-a-.-+-c
         |
         |
         +-b-.-d

如果我尝试添加模式bx,则树将是成功地遵循* .x路径,因此说该模式已经存在。

If I try to add the pattern b.x, the tree would be successful following the *.x path, and thereby say that the pattern already exists.

我要朝正确的方向前进吗?还是有一种已知的算法可以对此进行攻击?

Am I heading in the correct direction? Or is there an known algorithm for attacking this?

推荐答案

要检查两个通配符模式是否可以匹配相同的文件名,您可以查看这个问题的解决方法是在字符对之间创建比较网格,然后检查是否存在对角线路径。下图显示了如何检查通配符模式 ab?.c ?? a * bc。* 冲突:

To check whether two wildcard patterns could match the same filename, you can look at this problem as creating a grid of comparisons between pairs of characters, and then checking whether there exists a diagonal path. The illustration below shows how wildcard patterns ab?.c?? and a*bc.* can be checked for possible conflict:

两个相同文字字符之间的匹配找到后,您对角移动到下一个字符进行检查。 (用绿色箭头指示)

When a match between two identical literal characters is found, you move diagonally to the next characters to check. (indicated with green arrow)

当文字字符和单字符通配符时?,有两种可能性:通配符匹配字符(对角移动),或者通配符匹配空白,然后跳过它。 (用紫色箭头指示)

When a literal character and a single-character wild card ? are encountered, there are two possibilities: either the wild card matches the character (move diagonally), or the wildcard matches empty space, and you skip over it. (indicated with purple arrows)

当多字符通配符 * 遇到这种情况,需要考虑三种可能性:通配符在另一个字符之前匹配一个空格,通配符匹配另一个字符,或者通配符匹配多个字符。 (用蓝色箭头指示)

When a multi-character wild card * is encountered, three possibilities need to be taken into account: the wild card matches an empty space before the other character, the wild card matches the other character, or the wild card matches multiple characters. (indicated with blue arrows)

代码示例1(迭代)

下面是一个简单的javascript实现,它对角地遍历网格,标记可以访问的单元格从当前单元格中,然后检查右下角的单元格是否可达。运行代码片段以查看一些示例。 (更新:从上到下从左到右将很好,而不是对角线)

Below is a simple javascript implementation which iterates over the grid diagonally, marks cells which can be reached from the current cell, and then checks whether the bottom right cell is reachable. Run the code snippet to see a few examples. (update: top-to-bottom left-to-right will do fine instead of diagonally)

function wildConflict(wild1, wild2) {
    var grid = [[true]], width = wild1.length, height = wild2.length;
    for (var x = 1; x <= width; x++) grid[x] = [];
    for (var y = 0; y < height; y++) {
        for (var x = 0; x < width; x++) {
            if (grid[x][y]) {
                var a = wild1.charAt(x), b = wild2.charAt(y);
                if (a == '*' || b == '*' || a == '?' || b == '?' || a == b) grid[x + 1][y + 1] = true;
                if (a == '*' || b == '*' || a == '?') grid[x + 1][y] = true;
                if (a == '*' || b == '*' || b == '?') grid[x][y + 1] = true;
            }
        }
    }
    return grid[width][height] == true;
}

var a = ["a", "a", "a*", "abc", "a*", "*.x.y", "*.x.y", "a*b*", "a*bc.*", "a?c.de"];
var b = ["a", "b", "abc", "a?", "*b", "*.y", "*.x", "a*c*", "ab?.c??", "ac.d??"];
for (var i in a) document.write("&quot;" + a[i] + "&quot; &harr; &quot;" + b[i] + "&quot; &rarr; " + wildConflict(a[i], b[i]) + "<BR>");

代码示例2(递归)

简单的递归实现的缺点是可能多次检查某些字符对。它不需要2D数组,但是递归显然也要使用内存。

A simple recursive implementation has the drawback of potentially checking some character pairs more than once. It doesn't need the 2D-array, but the recursions obviously use memory too.

请注意,当多字符通配符 * <遇到/ code>时,该算法仅以两种可能性递归:跳过一个字符,或跳过另一个字符;当将通配符与下一个字符进行比较时,下一步将处理跳过两个字符(即通配符正好匹配一个字符)的问题。

Note that when a multi-character wild card *is encountered, the algorithm recurses with only two possibilities: jump over the one character, or jump over the other character; jumping over both characters (i.e. the wild card matches exactly one character) is taken care of in the next step, when the wild card is compared to the next character.

function wildConflict(wild1, wild2) {
    var w1 = wild1.split(''), w2 = wild2.split('');
    return conflict(0, 0);

    function conflict(p1, p2) {
        if (p1 == w1.length || p2 == w2.length) {
            if ((p1 == w1.length && p2 == w2.length)
            || (p1 == w1.length - 1 && (w1[p1] == '*' || w1[p1] == '?'))
            || (p2 == w2.length - 1 && (w2[p2] == '*' || w2[p2] == '?'))) {
                return true;
            }
            else return false;                            // premature end
        }
        else if (w1[p1] == '*' || w2[p2] == '*' || (w1[p1] == '?' && w2[p2] == '?')) {
            return conflict(p1 + 1, p2) || conflict(p1, p2 + 1);
        }
        else if (w1[p1] == '?') {
            return conflict(p1 + 1, p2) || conflict(p1 + 1, p2 + 1);
        }
        else if (w2[p2] == '?') {
            return conflict(p1, p2 + 1) || conflict(p1 + 1, p2 + 1);
        }
        else if (w1[p1] == w2[p2]) {
            return conflict(p1 + 1, p2 + 1);
        }
        else return false;                               // unequal literals
    }
}

var x = ["a", "a", "a*", "abc", "a*", "*.x.y", "*.x.y", "a*b*", "a*bc.*", "a?c.de"];
var y = ["a", "b", "abc", "a?", "*b", "*.y", "*.x", "a*c*", "ab?.c??", "ac.d??"];
for (var i in x) document.write("&quot;" + x[i] + "&quot; &harr; &quot;" + y[i] + "&quot; &rarr; " + wildConflict(x[i], y[i]) + "<BR>");

这篇关于使用通配符检查文件名搜索模式中的冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆