Python:解析和分组目录中的文件名 [英] Python: Parsing and grouping filenames in directory

查看:338
本文介绍了Python:解析和分组目录中的文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很熟悉python,但我有很多MATLAB和C.



我需要做什么来解析特定目录中的文件的文件名,根据文件名中的字段将它们分成组,然后在这些群体。



具体来说,文件名为:

  PROJECT-x-SUBJECT -x-SESSION-x-TYPE.extension 

其中'-x-'被有意插入作为场分隔。我需要对共享同一个PROJECT-x-SUBJECT-x-SESSION组件的每个文件组进行操作。



__ _ ____ __ _ __ _ __



我可以解析每个文件在一个时间:

  dirList = os.listdir(目录)
for dname中的fname:
#kill extension
ext = os.path.splitext(fname)
#获取4个字段
labels = ext [0] .split(' - x-')
PROJECT_list .append(labels [0])
SUBJECT_list.append(labels [1])$ ​​b $ b ...

...这反映了我对如何组织这些东西的唯一想法:通过创建4个列表并为每个文件名追加它们。



然后用我的4(有序的)列表,我可以调用如下:

 从集合导入计数器
c =计数器(SESSION_list)
列表(c)

至少我有一个唯一的SESSION名称列表



建议?我可以继续下去,但由于我只是需要一个起点,我认为这是足够的。



谢谢你们,

解决方案

你可以使用 defaultdict 来制作一个包含列表的字典:

 从集合import defaultdict 

groups = defaultdict(list)

在os.listdir(目录)中的文件名:
basename,extension = os.path.splitext(filename)
项目,subject,session,ftype = basename.split(' - x-')

groups [session] .append(filename)

现在,包含会话名称和文件名之间的映射。


I'm pretty new to python, but I have lots of experience with MATLAB & C.

What I need to do it parse the filenames of files in a particular directory, separate them into groups according to the fields within the file names, and perform operations within these groups.

Specifically, the filenames are:

PROJECT-x-SUBJECT-x-SESSION-x-TYPE.extension

where that '-x-' has been purposely inserted as the field divider. I need to do operations on every group of files that shares the same PROJECT-x-SUBJECT-x-SESSION component.

_______My best attempt follows: ________

I can parse each of the files one at a time by:

dirList=os.listdir(directory)
for fname in dirList:  
    # kill extension
    ext = os.path.splitext(fname)
    # get the 4 fields 
    labels=ext[0].split('-x-')
    PROJECT_list.append(labels[0])
    SUBJECT_list.append(labels[1])
    ...

... which reflects this only idea I have had on how to organize this stuff: by creating 4 lists and appending to them for each filename.

Then with my 4 (ordered?) lists, I could then call something like:

from collections import Counter
c=Counter(SESSION_list) 
list(c)

Then at least I have a unique list of SESSION names

Suggestions? I could go on, but since I really just need a starting point, I think that this is sufficient.

Thanks, guys.

解决方案

You can use defaultdict to make a dictionary that contains lists:

from collections import defaultdict

groups = defaultdict(list)

for filename in os.listdir(directory):
    basename, extension = os.path.splitext(filename)
    project, subject, session, ftype = basename.split('-x-')

    groups[session].append(filename)

Now, groups contains a mapping between session names and filenames.

这篇关于Python:解析和分组目录中的文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆