Python:解析和分组目录中的文件名 [英] Python: Parsing and grouping filenames in directory
问题描述
我需要做什么来解析特定目录中的文件的文件名,根据文件名中的字段将它们分成组,然后在这些群体。
具体来说,文件名为:
PROJECT-x-SUBJECT -x-SESSION-x-TYPE.extension
其中'-x-'被有意插入作为场分隔。我需要对共享同一个PROJECT-x-SUBJECT-x-SESSION组件的每个文件组进行操作。
__ _ ____ __ _ __ _ __
我可以解析每个文件在一个时间:
dirList = os.listdir(目录)
for dname中的fname:
#kill extension
ext = os.path.splitext(fname)
#获取4个字段
labels = ext [0] .split(' - x-')
PROJECT_list .append(labels [0])
SUBJECT_list.append(labels [1])$ b $ b ...
...这反映了我对如何组织这些东西的唯一想法:通过创建4个列表并为每个文件名追加它们。
然后用我的4(有序的)列表,我可以调用如下:
从集合导入计数器
c =计数器(SESSION_list)
列表(c)
至少我有一个唯一的SESSION名称列表
建议?我可以继续下去,但由于我只是需要一个起点,我认为这是足够的。
谢谢你们,
你可以使用 defaultdict
来制作一个包含列表的字典:
从集合import defaultdict
groups = defaultdict(list)
在os.listdir(目录)中的文件名:
basename,extension = os.path.splitext(filename)
项目,subject,session,ftype = basename.split(' - x-')
groups [session] .append(filename)
现在,组
包含会话名称和文件名之间的映射。
I'm pretty new to python, but I have lots of experience with MATLAB & C.
What I need to do it parse the filenames of files in a particular directory, separate them into groups according to the fields within the file names, and perform operations within these groups.
Specifically, the filenames are:
PROJECT-x-SUBJECT-x-SESSION-x-TYPE.extension
where that '-x-' has been purposely inserted as the field divider. I need to do operations on every group of files that shares the same PROJECT-x-SUBJECT-x-SESSION component.
_______My best attempt follows: ________
I can parse each of the files one at a time by:
dirList=os.listdir(directory)
for fname in dirList:
# kill extension
ext = os.path.splitext(fname)
# get the 4 fields
labels=ext[0].split('-x-')
PROJECT_list.append(labels[0])
SUBJECT_list.append(labels[1])
...
... which reflects this only idea I have had on how to organize this stuff: by creating 4 lists and appending to them for each filename.
Then with my 4 (ordered?) lists, I could then call something like:
from collections import Counter
c=Counter(SESSION_list)
list(c)
Then at least I have a unique list of SESSION names
Suggestions? I could go on, but since I really just need a starting point, I think that this is sufficient.
Thanks, guys.
You can use defaultdict
to make a dictionary that contains lists:
from collections import defaultdict
groups = defaultdict(list)
for filename in os.listdir(directory):
basename, extension = os.path.splitext(filename)
project, subject, session, ftype = basename.split('-x-')
groups[session].append(filename)
Now, groups
contains a mapping between session names and filenames.
这篇关于Python:解析和分组目录中的文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!