Python在整个列表中查找字符串的公共部分并将其从每个项目中删除 [英] Python finding the common parts of a string throughout a list and removing it from every item

查看:181
本文介绍了Python在整个列表中查找字符串的公共部分并将其从每个项目中删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于以下文件目录的列表:

I have a list of file directories that looks similar to this:

path/new/stuff/files/morefiles/A/file2.txt
path/new/stuff/files/morefiles/B/file7.txt
path/new/stuff/files/morefiles/A/file1.txt
path/new/stuff/files/morefiles/C/file5.txt

我试图从每个列表中删除相同的路径的开头,然后从每个文件中删除.

I am trying to remove the beginnings of the paths that are the same from every list, and then deleting that from each file.

列表可以是任意长度,在示例中,我将尝试将列表更改为:

The list can be any length, and in the example I would be trying to change the list into:

A/file2.txt
B/file7.txt
A/file1.txt
C/file5.txt

可以使用诸如re.sub(r'.*I', 'I', filepath)filepath.split('_', 1)[-1]之类的方法进行替换,但是我不确定如何在文件路径列表中找到公用部分

Methods like re.sub(r'.*I', 'I', filepath) and filepath.split('_', 1)[-1] can be used for the replacing, but I'm not sure about how to find the common parts in the list of filepaths

注意:

我正在使用Windows和python 3

I am using Windows and python 3

推荐答案

答案的第一部分在这里:

The first part of the answer is here: Python: Determine prefix from a set of (similar) strings

使用os.path.commonprefix()查找字符串的最长公共部分(第一部分)

Use os.path.commonprefix() to find the longest common (first part) of the string

用于选择与该答案相同的列表部分的代码是:

The code for selecting the part of the list that is the same as from that answer is:

# Return the longest prefix of all list elements.
def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1

现在您所要做的就是使用切片从列表中的每个项目中删除结果字符串

Now all you have to do is use slicing to remove the resulting string from each item in the list

结果是:

# Return the longest prefix of all list elements.
def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            ans = s1[:i]
            break
    for each in range(len(m)):
        m[each] = m[each].split(ans, 1)[-1]
    return m

这篇关于Python在整个列表中查找字符串的公共部分并将其从每个项目中删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆