python threading.timer设置程序运行时间耗尽的时间限制 [英] python threading.timer set time limit when program runs out of time
问题描述
我有一些问题与设置Python中函数的最大运行时间有关.实际上,我想使用pdfminer
将.pdf
文件转换为.txt
.
I have some questions related to setting the maximum running time of a function in Python. In fact, I would like to use pdfminer
to convert the .pdf
files to .txt
.
问题在于,很多情况下,有些文件无法解码,并且耗时极长.因此,我想设置threading.Timer()
以将每个文件的转换时间限制为5秒.另外,我在Windows下运行,因此无法为此使用signal
模块.
The problem is that very often, some files are not possible to decode and take extremely long time. So I want to set threading.Timer()
to limit the conversion time for each file to 5 seconds. In addition, I run under windows so I cannot use the signal
module for this.
我成功地使用pdfminer.convert_pdf_to_txt()
运行了转换代码(在我的代码中是"c
"),但是我不确定下面的代码threading.Timer()
是否可以正常工作. (我认为这不会适当限制每次处理的时间)
I succeeded in running the conversion code with pdfminer.convert_pdf_to_txt()
(in my code it is "c
"), but I am not sure that the in the following code, threading.Timer()
works. (I don't think it properly constrains the time for each processing)
总而言之,我想:
-
将pdf转换为txt
Convert the pdf to txt
每次转换的时间限制为5秒,如果时间用完,则抛出异常并保存一个空文件
Time limit for each conversion is 5 sec, if it runs out of time, throw an exception and save an empty file
将所有txt文件保存在同一文件夹下
Save all the txt files under the same folder
如果有任何异常/错误,请仍然保存文件,但内容为空.
If there are any exceptions/errors, still save the file but with empty content.
这是当前代码:
import converter as c
import os
import timeit
import time
import threading
import thread
yourpath = 'D:/hh/'
def iftimesout():
print("no")
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
for root, dirs, files in os.walk(yourpath, topdown=False):
for name in files:
try:
timer = threading.Timer(5.0,iftimesout)
timer.start()
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write(c.convert_pdf_to_txt(os.path.join(root, name)))
print("yes")
timer.cancel()
except KeyboardInterrupt:
raise
except:
for name in files:
t=os.path.split(os.path.dirname(os.path.join(root, name)))[1]
a=str(os.path.split(os.path.dirname(os.path.join(root, name)))[0])
g=str(a.split("\\")[1])
with open("D:/f/"+g+"&"+t+"&"+name+".txt", mode="w") as newfile:
newfile.write("")
推荐答案
检查以下代码,以防万一.还请让我知道您是否仍要使用强制终止功能(KeyboardInterruption
)
Check following code and let me know in case of any issues. Also let me know whether you still want to use force termination feature (KeyboardInterruption
)
path_to_pdf = "C:\\Path\\To\\Main\\PDFs" # No "\\" at the end of path!
path_to_text = "C:\\Path\\To\\Save\\Text\\" # There is "\\" at the end of path!
TIMEOUT = 5 # seconds
TIME_TO_CHECK = 1 # seconds
# Save PDF content into text file or save empty file in case of conversion timeout
def convert(path_to, my_pdf):
my_txt = text_file_name(my_pdf)
with open(my_txt, "w") as my_text_file:
try:
my_text_file.write(convert_pdf_to_txt(path_to + '\\' + my_pdf))
except:
print "Error. %s file wasn't converted" % my_pdf
# Convert file_name.pdf from PDF folder to file_name.text in Text folder
def text_file_name(pdf_file):
return path_to_text + (pdf_file.split('.')[0]+ ".txt")
if __name__ == "__main__":
# for each pdf file in PDF folder
for root, dirs, files in os.walk(path_to_pdf, topdown=False):
for my_file in files:
count = 0
p = Process(target=convert, args=(root, my_file,))
p.start()
# some delay to be sure that text file created
while not os.path.isfile(text_file_name(my_file)):
time.sleep(0.001)
while True:
# if not run out of $TIMEOUT and file still empty: wait for $TIME_TO_CHECK,
# else: close file and start new iteration
if count < TIMEOUT and os.stat(text_file_name(my_file)).st_size == 0:
count += TIME_TO_CHECK
time.sleep(TIME_TO_CHECK)
else:
p.terminate()
break
这篇关于python threading.timer设置程序运行时间耗尽的时间限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!