逆全局-从文件名反向工程通配符字符串 [英] Inverse glob - reverse engineer a wildcard string from file names

查看:65
本文介绍了逆全局-从文件名反向工程通配符字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从一对文件名生成一个通配符字符串.逆球的种类.示例:

I want to generate a wildcard string from a pair of file names. Kind of an inverse-glob. Example:

file1 = 'some foo file.txt'
file2 = 'some bar file.txt'
assert 'some * file.txt' == inverse_glob(file1, file2)

也许使用 difflib ?这已经解决了吗?

Use difflib perhaps? Has this been solved already?

应用程序是大量具有相似名称的数据文件.我想比较每对文件名,然后比较具有相似"名称的文件对.我认为如果我可以对每对进行反向glob,那么具有良好"通配符(例如,不是lots*of*stars*.txt也不是*)的那些对是比较的不错候选者.因此,我可能会采用此假定的inverse_glob()的输出,并拒绝具有多个*glob()不能完全生成两个文件的通配符.

Application is a large set of data files with similar names. I want to compare each pair of file names and then present a comparison of pairs of files with "similar" names. I figure if I can do a reverse-glob on each pair, then those pairs with "good" wildcards (e.g. not lots*of*stars*.txt nor *) are good candidates for comparison. So I might take the output of this putative inverse_glob() and reject wildcards that have more than one * or for which glob() doesn't produce exactly two files.

推荐答案

例如:

文件名:

names = [('some foo file.txt','some bar file.txt', 'some * file.txt'),
         ("filename.txt", "filename2.txt", "filenam*.txt"),
         ("1filename.txt", "filename2.txt", "*.txt"),
         ("inverse_glob", "inverse_glob2", "inverse_glo*"),
         ("the 24MHz run new.sr", "the 16MHz run old.sr", "the *MHz run *.sr")]

def inverse_glob(...):

    import re
    def inverse_glob(f1, f2, force_single_asterisk=None):
        def adjust_name(pp, diff):
            if len(pp) == 2:
                return pp[0][:-diff] + '?'*(diff+1) + '.' + pp[1]
            else:
                return pp[0][:-diff] + '?' * (diff + 1)

        l1 = len(f1); l2 = len(f2)
        if l1 > l2:
            f2 = adjust_name(f2.split('.'), l1-l2)
        elif l2 > l1:
            f1 = adjust_name(f1.split('.'), l2-l1)

        result = ['?' for n in range(len(f1))]
        for i, c in enumerate(f1):
            if c == f2[i]:
                result[i] = c

        result = ''.join(result)
        result = re.sub(r'\?{2,}', '*', result)
        if force_single_asterisk:
            result = re.sub(r'\*.+\*', '*', result)
        return result

用法:

for name in names:
    result = inverse_glob(name[0], name[1])
    print('{:20} <=> {:20} = {}'.format(name[0], name[1], result))
    assert name[2] == result

输出:

some foo file.txt    <=> some bar file.txt    = some * file.txt  
filename.txt         <=> filename2.txt        = filenam*.txt  
1filename.txt        <=> filename2.txt        = *.txt  
inverse_glob         <=> inverse_glob2        = inverse_glo*
the 24MHz run new.sr <=> the 16MHz run old.sr = the *MHz run *.sr

使用Python:3.4.2测试

Tested with Python:3.4.2

这篇关于逆全局-从文件名反向工程通配符字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆