Python win32com.client.Dispatch在Word文档中循环并导出为PDF;下一个循环发生时失败 [英] Python win32com.client.Dispatch looping through Word documents and export to PDF; fails when next loop occurs

查看:934
本文介绍了Python win32com.client.Dispatch在Word文档中循环并导出为PDF;下一个循环发生时失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于此处的脚本: .doc转换为使用python的pdf 有一个半工作脚本,可将.docx文件从C:\ Export_to_pdf导出为pdf到新文件夹中.

问题在于它会通过前两个文档,然后失败:

(-2147352567, 'Exception occurred.', (0, u'Microsoft Word', u'Command failed', u'wdmain11.chm', 36966, -2146824090), None)

这显然是无用的常规错误消息.如果我使用pdb对其进行缓慢调试,则可以遍历所有文件并成功导出.如果我也注意Windows Task Manager中的进程,我可以看到WINWORD开始,然后按预期结束,但是在较大的文件上,需要更长的时间来稳定内存使用.这使我认为,当WINWORD没有时间在client.Dispatch对象上调用下一个方法之前初始化或退出时,脚本就会跳闸.

win32com或comtypes是否可以识别并等待进程开始或完成?

我的脚本:

import os
from win32com import client

folder = "C:\\Export_to_pdf"
file_type = 'docx'
out_folder = folder + "\\PDF"

os.chdir(folder)

if not os.path.exists(out_folder):
    print 'Creating output folder...'
    os.makedirs(out_folder)
    print out_folder, 'created.'
else:
    print out_folder, 'already exists.\n'

for files in os.listdir("."):
    if files.endswith(".docx"):
        print files

print '\n\n'

try:
    for files in os.listdir("."):
        if files.endswith(".docx"):
            out_name = files.replace(file_type, r"pdf")
            in_file = os.path.abspath(folder + "\\" + files)
            out_file = os.path.abspath(out_folder + "\\" + out_name)
            word = client.Dispatch("Word.Application")
            doc = word.Documents.Open(in_file)
            print 'Exporting', out_file
            doc.SaveAs(out_file, FileFormat=17)
            doc.Close()
            word.Quit()
except Exception, e:
    print e

工作代码-只是用这个替换了try块.注意将DispatchEx语句移到了for循环之外,并将word.Quit()移到了finally语句以确保其关闭.

try:
    word = client.DispatchEx("Word.Application")
    for files in os.listdir("."):
        if files.endswith(".docx") or files.endswith('doc'):
            out_name = files.replace(file_type, r"pdf")
            in_file = os.path.abspath(folder + "\\" + files)
            out_file = os.path.abspath(out_folder + "\\" + out_name)
            doc = word.Documents.Open(in_file)
            print 'Exporting', out_file
            doc.SaveAs(out_file, FileFormat=17)
            doc.Close()
except Exception, e:
    print e
finally:
    word.Quit()

解决方案

这可能不是问题,但是没有必要分派一个单独的单词实例,然后在每次迭代中将其关闭,这可能是导致链内存问题的原因正在看到.您只需要打开一次实例,就可以在该实例中打开和关闭所需的所有文档.如下所示:

try:
    word = client.DispatchEx("Word.Application") # Using DispatchEx for an entirely new Word instance
    word.Visible = True # Added this in here so you can see what I'm talking about with the movement of the dispatch and Quit lines. 
    for files in os.listdir("."):
        if files.endswith(".docx"):
            out_name = files.replace(file_type, r"pdf")
            in_file = os.path.abspath(folder + "\\" + files)
            out_file = os.path.abspath(out_folder + "\\" + out_name)
            doc = word.Documents.Open(in_file)
            print 'Exporting', out_file
            doc.SaveAs(out_file, FileFormat=17)
            doc.Close()

    word.Quit()

except Exception, e:

注意:在打开win32com实例和文件时,请小心使用try/except,就像打开它们一样,并且在关闭该错误之前会发生错误,因为它尚未关闭(因为尚未到达该命令).

此外,您可能要考虑使用DispatchEx而不是仅使用Dispatch. DispatchEx会打开一个新实例(一个全新的.exe),而我相信仅使用Dispatch会尝试并寻找一个可锁定到的打开实例,但是有关此文档的介绍很模糊.如果实际上您想要多个实例(即,在一个文件中打开一个文件,在另一个文件中打开一个文件),请使用DispatchEx.

至于等待,当需要更多时间时,程序应仅在该行等待,但我不知道.

哦!如果您希望能够看到实例和文件实际打开,也可以使用word.Visible = True(可能有助于直观地查看问题,但在修复后将其关闭,因为它将使速度减慢;-)).

Based on the script here: .doc to pdf using python I've got a semi-working script to export .docx files to pdf from C:\Export_to_pdf into a new folder.

The problem is that it gets through the first couple of documents and then fails with:

(-2147352567, 'Exception occurred.', (0, u'Microsoft Word', u'Command failed', u'wdmain11.chm', 36966, -2146824090), None)

This, apparently is an unhelpful general error message. If I debug slowly it using pdb, I can loop through all files and export successfully. If I also keep an eye on the processes in Windows Task Manager I can see that WINWORD starts then ends when it is supposed to, but on the larger files it takes longer for the memory usage to stablise. This makes me think that the script is tripping up when WINWORD doesn't have time to initialize or quit before the next method is called on the client.Dispatch object.

Is there a way with win32com or comtypes to identify and wait for a process to start or finish?

My script:

import os
from win32com import client

folder = "C:\\Export_to_pdf"
file_type = 'docx'
out_folder = folder + "\\PDF"

os.chdir(folder)

if not os.path.exists(out_folder):
    print 'Creating output folder...'
    os.makedirs(out_folder)
    print out_folder, 'created.'
else:
    print out_folder, 'already exists.\n'

for files in os.listdir("."):
    if files.endswith(".docx"):
        print files

print '\n\n'

try:
    for files in os.listdir("."):
        if files.endswith(".docx"):
            out_name = files.replace(file_type, r"pdf")
            in_file = os.path.abspath(folder + "\\" + files)
            out_file = os.path.abspath(out_folder + "\\" + out_name)
            word = client.Dispatch("Word.Application")
            doc = word.Documents.Open(in_file)
            print 'Exporting', out_file
            doc.SaveAs(out_file, FileFormat=17)
            doc.Close()
            word.Quit()
except Exception, e:
    print e

The working code - just replaced the try block with this. Note moved the DispatchEx statement outside the for loop and the word.Quit() to a finally statement to ensure it closes.

try:
    word = client.DispatchEx("Word.Application")
    for files in os.listdir("."):
        if files.endswith(".docx") or files.endswith('doc'):
            out_name = files.replace(file_type, r"pdf")
            in_file = os.path.abspath(folder + "\\" + files)
            out_file = os.path.abspath(out_folder + "\\" + out_name)
            doc = word.Documents.Open(in_file)
            print 'Exporting', out_file
            doc.SaveAs(out_file, FileFormat=17)
            doc.Close()
except Exception, e:
    print e
finally:
    word.Quit()

解决方案

The might not be the problem but dispatching a separate word instance and then closing it within each iteration is not necessary and may be the cause of the strand memory problem you are seeing. You only need to open the instance once and within that instance you can open and close all the documents you need. Like the following:

try:
    word = client.DispatchEx("Word.Application") # Using DispatchEx for an entirely new Word instance
    word.Visible = True # Added this in here so you can see what I'm talking about with the movement of the dispatch and Quit lines. 
    for files in os.listdir("."):
        if files.endswith(".docx"):
            out_name = files.replace(file_type, r"pdf")
            in_file = os.path.abspath(folder + "\\" + files)
            out_file = os.path.abspath(out_folder + "\\" + out_name)
            doc = word.Documents.Open(in_file)
            print 'Exporting', out_file
            doc.SaveAs(out_file, FileFormat=17)
            doc.Close()

    word.Quit()

except Exception, e:

Note: Be careful using try/except when opening win32com instances and files as if you open them and the error occurs before you close it it won't close (as it has not reached that command yet).

Also you may want to consider using DispatchEx instead of just Dispatch. DispatchEx opens a new instance (an entirely new .exe) whereas I believe just using Dispatch will try and look for an open instance to latch onto but the documentation of this is foggy. Use DispatchEx if in fact you want more than one instance (i.e open one file in one and one file in another).

As for waiting, the program should just wait on that line when more time is needed but I dunno.

Oh! also you can use word.Visible = True if you want to be able to see the instance and files actually open (might be useful to visually see the problem but turn it of when fixed because it will def slow things down ;-) ).

这篇关于Python win32com.client.Dispatch在Word文档中循环并导出为PDF;下一个循环发生时失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆