如何使用 glob 读取具有数字名称的有限文件集? [英] How to use glob to read limited set of files with numeric names?
问题描述
如何使用 glob 只读取有限的文件集?
我在同一目录中有名为 50 到 20000 数字的 json 文件(例如 50.json,51.json,52.json...19999.json,20000.json).我只想读取编号为 15000 到 18000 的文件.
为此,我使用了一个 glob,如下所示,但每次我尝试过滤数字时它都会生成一个空列表.我已尽力点击此链接(https://docs.python.org/2/library/glob.html),但我不确定我做错了什么.
<预><代码>>>>目录 = "/Users/Chris/Dropbox">>>read_files = glob.glob(directory+"/[15000-18000].*")>>>打印 read_files[]另外,如果我想要任何大于 18000 的文件怎么办?
您使用的 glob 语法不正确;[..]
序列对每个字符起作用.以下 glob 将正确匹配您的文件:
'1[5-8][0-9][0-9][0-9].*'
在幕后,glob
使用 fnmatch
将模式转换为正则表达式.您的模式转换为:
匹配.
之前的1个字符,一个0
,1
,5
或 8
.没有别的.
glob
模式非常有限;匹配数字范围并不容易;您必须为范围创建单独 glob,例如 (glob('1[8-9][0-9][0-9][0-9]')+ glob('2[0-9][0-9][0-9][0-9]')
等).
改为自己过滤:
directory = "/Users/Chris/Dropbox"对于 os.listdir(directory) 中的文件名:基本名称,ext = os.path.splitext(文件名)如果 ext != '.json':继续尝试:数字 = int(basename)除了值错误:continue # 不是数字如果 18000 <= 数字 <= 19000:# 处理文件文件名 = os.path.join(目录,文件名)
How to use glob to only read limited set of files?
I have json files named numbers from 50 to 20000 (e.g. 50.json,51.json,52.json...19999.json,20000.json) within the same directory. I want to read only the files numbered from 15000 to 18000.
To do so I'm using a glob, as shown below, but it generates an empty list every time I try to filter out for the numbers. I've tried my best to follow this link (https://docs.python.org/2/library/glob.html), but I'm not sure what I'm doing wrong.
>>> directory = "/Users/Chris/Dropbox"
>>> read_files = glob.glob(directory+"/[15000-18000].*")
>>> print read_files
[]
Also, what if I wanted files with any number greater than 18000?
You are using the glob syntax incorrectly; the [..]
sequence works per character. The following glob would match your files correctly instead:
'1[5-8][0-9][0-9][0-9].*'
Under the covers, glob
uses fnmatch
which translates the pattern to a regular expression. Your pattern translates to:
>>> import fnmatch
>>> fnmatch.translate('[15000-18000].*')
'[15000-18000]\..*\Z(?ms)'
which matches 1 character before the .
, a 0
, 1
, 5
or 8
. Nothing else.
glob
patterns are quite limited; matching numeric ranges is not easy with it; you'd have to create separate globs for ranges, for example (glob('1[8-9][0-9][0-9][0-9]') + glob('2[0-9][0-9][0-9][0-9]')
, etc.).
Do your own filtering instead:
directory = "/Users/Chris/Dropbox"
for filename in os.listdir(directory):
basename, ext = os.path.splitext(filename)
if ext != '.json':
continue
try:
number = int(basename)
except ValueError:
continue # not numeric
if 18000 <= number <= 19000:
# process file
filename = os.path.join(directory, filename)
这篇关于如何使用 glob 读取具有数字名称的有限文件集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!