局限在Python的全球范围内吗? [英] Limitation to Python's glob?

查看:65
本文介绍了局限在Python的全球范围内吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用glob将文件名提供给循环,如下所示:

I'm using glob to feed file names to a loop like so:

inputcsvfiles = glob.iglob('NCCCSM*.csv')

for x in inputcsvfiles:

    csvfilename = x
    do stuff here

我用来制作此脚本原型的玩具示例可以很好地与2个,10个甚至100个输入的csv文件配合使用,但实际上我需要它来循环10959个文件.当使用那么多文件时,该脚本在第一次迭代后便停止工作,并且无法找到第二个输入文件.

The toy example that I used to prototype this script works fine with 2, 10, or even 100 input csv files, but I actually need it to loop through 10,959 files. When using that many files, the script stops working after the first iteration and fails to find the second input file.

鉴于该脚本在合理"的条目数(2-100)下绝对可以正常工作,但在我需要的条目数(10,959)上却没有,这是一种更好的方法来处理这种情况,或者我可以使用某种参数可以设置为允许大量迭代吗?

Given that the script works absolutely fine with a "reasonable" number of entries (2-100), but not with what I need (10,959) is there a better way to handle this situation, or some sort of parameter that I can set to allow for a high number of iterations?

PS-最初我使用的是glob.glob,但是glob.iglob表现不佳.

PS- initially I was using glob.glob, but glob.iglob fairs no better.

上面的扩展以获取更多上下文...

An expansion of above for more context...

    # typical input file looks like this: "NCCCSM20110101.csv", "NCCCSM20110102.csv", etc.   
    inputcsvfiles = glob.iglob('NCCCSM*.csv')

    # loop over individial input files    
      for x in inputcsvfiles:

        csvfile = x
        modelname = x[0:5]

        # ArcPy
        arcpy.AddJoin_management(inputshape, "CLIMATEID", csvfile, "CLIMATEID", "KEEP_COMMON")

        do more stuff after

脚本在ArcPy行失败,该行将"csvfile"变量传递到命令中.报告的错误是它找不到指定的csv文件(例如"NCCSM20110101.csv"),而实际上csv肯定在目录中.难道您不能像我上面那样多次重复使用声明的变量(x)?同样,如果要遍历的目录只有100个左右的文件,则此方法会很好地工作,但是如果有很多文件(例如10,959),则在列表下方的某个地方似乎会任意失败.

The script fails at the ArcPy line, where the "csvfile" variable gets passed into the command. The error reported is that it can't find a specified csv file (e.g., "NCCSM20110101.csv"), when in fact, the csv is definitely in the directory. Could it be that you can't reuse a declared variable (x) multiple times as I have above? Again, this will work fine if the directory being glob'd only has 100 or so files, but if there's a whole lot (e.g., 10,959), it fails seemingly arbitrarily somewhere down the list.

推荐答案

出现的一个问题不是Python本身,而是ArcPy和/或

One issue that arose was not with Python per se, but rather with ArcPy and/or MS handling of CSV files (more the latter, I think). As the loop iterates, it creates a schema.ini file whereby information on each CSV file processed in the loop gets added and stored. Over time, the schema.ini gets rather large and I believe that's when the performance issues arise.

我的解决方案虽然可能不太优雅,但在每次循环期间都请删除schema.ini文件,以避免出现此问题.这样可以使我处理10k + CSV文件,尽管速度很慢.说实话,最后我们最终使用GRASS和BASH脚本.

My solution, although perhaps inelegant, was do delete the schema.ini file during each loop to avoid the issue. Doing so allowed me to process the 10k+ CSV files, although rather slowly. Truth be told, we wound up using GRASS and BASH scripting in the end.

这篇关于局限在Python的全球范围内吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆