Windows 上的 Python 2.7——打开的文件太多 [英] Python 2.7 on Windows -- Too Many Open Files
问题描述
我正在运行一个脚本来替换文件名中的德语变音符号.有超过 1700 个文件需要我执行此操作,但是我收到一个错误消息,表明脚本运行一段时间后打开的文件太多.任何人有任何想法如何解决这个问题?非常感谢您的反馈!
I'm running a script that is replacing german umlauts in file names. There are over 1700 files that I need to do this for, but I'm getting an error indicating that there are too many open files after the script runs for a while. Anyone have any ideas how to fix this? Feedback is greatly appreciated!
代码:
# -*- coding: utf-8 -*-
''' Script replaces all umlauts in filenames within a root directory and its subdirectories with the English
equivalent (ie. ä replaced with ae, Ä replaced with Ae).'''
import os
import itertools
import logging
from itertools import groupby
##workspace = u'G:\\Dvkoord\\GIS\\TEMP\\Tle\\Scripts\\Umlaut'
workspace = u'G:\\Gis\\DATEN'
log = 'Umlauts.log'
logPath = r"G:\Dvkoord\GIS\TEMP\Tle\Scripts\Umlaut\Umlauts.log"
logMessageFormat = '%(asctime)s - %(levelname)s - %(message)s'
def GetFilepaths(directory):
"""Function returns a list of file paths in a directory tree using os.walk. Parameter: directory
"""
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath)
## file_paths = list(set(file_paths))
return file_paths
def uniq(input):
output = []
for x in input:
if x not in output:
output.append(x)
return output
def Logging(logFile, logLevel, destination, textFormat, comment):
"""Function writes a log file. Parameters: logFile (name the log file w/extension),
logLevel (DEBUG, INFO, etc.), destination (path under which the log file will be
saved including name and extension), textFormat (how the log text will be formatted)
and comment.
"""
# logging
logger = logging.getLogger(__name__)
# set log level
logger.setLevel(logLevel)
# create a file handler for the log -- unless a separate path is specified, it will output to the directory where this script is stored
logging.FileHandler(logFile)
handler = logging.FileHandler(destination)
handler.setLevel(logLevel)
# create a logging format
formatter = logging.Formatter(textFormat)
handler.setFormatter(formatter)
# add the handlers to the logger
logger.addHandler(handler)
logger.info(comment)
def main():
# dictionary of umlaut unicode representations (keys) and their replacements (values)
umlautDictionary = {
u'Ä': 'Ae',
u'Ö': 'Oe',
u'Ü': 'Ue',
u'ä': 'ae',
u'ö': 'oe',
u'ü': 'ue',
u'ß': 'ss'
}
dataTypes = [".CPG",
".dbf",
".prj",
".sbn",
".sbx",
".shp",
".shx",
".shp.xml",
".lyr"]
# get file paths in root directory and subfolders
filePathsList = GetFilepaths(workspace)
# put all filepaths with an umlaut in filePathsUmlaut list
filePathsUmlaut = []
for fileName in filePathsList:
## print fileName
for umlaut in umlautDictionary:
if umlaut in os.path.basename(fileName):
for dataType in dataTypes:
if dataType in fileName:
## print fileName
filePathsUmlaut.append(fileName)
# remove duplicate paths from filePathsUmlaut
uniquesUmlauts = uniq(filePathsUmlaut)
# create a dictionary for umlaut translation
umap = {
ord(key):unicode(val)
for key, val in umlautDictionary.items()
}
# use translate and umap dictionary to replace umlauts in file name and put them in the newFilePaths list
# without changing any of the umlauts in folder names or upper directories
newFilePaths = []
for fileName in uniquesUmlauts:
pardir = os.path.dirname(fileName)
baseName = os.path.basename(fileName)
newBaseFileName = baseName.translate(umap)
newPath = os.path.join(pardir, newBaseFileName)
newFilePaths.append(newPath)
newFilePaths = uniq(newFilePaths)
# create a dictionary with the old umlaut path as key and new non-umlaut path as value
dictionaryOldNew = dict(itertools.izip(uniquesUmlauts, newFilePaths))
# rename old file (key) as new file (value)
for files in uniquesUmlauts:
for key, value in dictionaryOldNew.iteritems():
if key == files:
comment = '%s'%files + ' wurde als ' '%s'%value + ' umbenannt.'
print comment
if os.path.exists(value):
os.remove(value)
os.rename(files, value)
Logging(log, logging.INFO, logPath, logMessageFormat, comment)
if __name__ == '__main__':
main()
推荐答案
我认为问题在于您的 Logging
功能.每次登录时,您都会创建一个新的 FileHandler
并将其添加到处理程序集中,并且您对每个重命名的文件都执行此操作,因此您很快就会达到打开文件描述符的限制.配置你的记录器一次,然后多次使用,不要每次使用都配置.
I think the problem is your Logging
function. Every time you log, you're creating a new FileHandler
and adding it to the set of handlers, and you do this for every file renamed, so you rapidly hit the limit on open file descriptors. Configure your logger once, then use it many times, don't configure it every time you use it.
请注意,Logging
中可能不会引发异常;在 Windows 上删除文件涉及打开它进行删除,因此您可以使用记录器将打开的文件最大化,然后在尝试删除文件时失败.
Note that the exception might not be raised in Logging
; deleting a file on Windows involves opening it for delete, so you could max out open files with loggers, then fail when you try to delete a file.
这篇关于Windows 上的 Python 2.7——打开的文件太多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!