如何使用python-docx替换Word文档中的文本并保存 [英] How to use python-docx to replace text in a Word document and save

查看:707
本文介绍了如何使用python-docx替换Word文档中的文本并保存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

同一页中提到的oodocx模块将用户引向一个似乎不存在的/examples文件夹.
我已经阅读了python-docx 0.7.2的文档,以及在Stackoverflow上可以找到的有关该主题的所有内容,因此请相信我已经完成了作业".

The oodocx module mentioned in the same page refers the user to an /examples folder that does not seem to be there.
I have read the documentation of python-docx 0.7.2, plus everything I could find in Stackoverflow on the subject, so please believe that I have done my "homework".

Python是我所知道的唯一语言(初学者+,也许是中级),所以请不要假设对C,Unix,xml等有任何了解.

Python is the only language I know (beginner+, maybe intermediate), so please do not assume any knowledge of C, Unix, xml, etc.

任务:打开其中包含一行文本的ms-word 2007+文档(为简单起见),然后用其字典值替换该行文本中出现的字典"中的所有关键"词.然后关闭文档,使其他所有内容保持不变.

Task : Open a ms-word 2007+ document with a single line of text in it (to keep things simple) and replace any "key" word in Dictionary that occurs in that line of text with its dictionary value. Then close the document keeping everything else the same.

一行文字(例如)我们将在海洋中徘徊."

Line of text (for example) "We shall linger in the chambers of the sea."

from docx import Document

document = Document('/Users/umityalcin/Desktop/Test.docx')

Dictionary = {‘sea’: "ocean"}

sections = document.sections
for section in sections:
    print(section.start_type)

#Now, I would like to navigate, focus on, get to, whatever to the section that has my
#single line of text and execute a find/replace using the dictionary above.
#then save the document in the usual way.

document.save('/Users/umityalcin/Desktop/Test.docx')

我没有在文档中看到允许我执行此操作的任何内容-也许有,但我没有得到,因为一切都没有在我的水平上阐明.

I am not seeing anything in the documentation that allows me to do this—maybe it is there but I don’t get it because everything is not spelled-out at my level.

我已遵循此网站上的其他建议,并尝试使用该模块的早期版本( https://github.com/mikemaccana/python-docx )应该具有"replace,advReplace之类的方法",如下所示:我在python解释器中打开源代码,并在末尾添加以下内容(这是是为了避免与已安装的0.7.2版本发生冲突):

I have followed other suggestions on this site and have tried to use earlier versions of the module (https://github.com/mikemaccana/python-docx) that is supposed to have "methods like replace, advReplace" as follows: I open the source-code in the python interpreter, and add the following at the end (this is to avoid clashes with the already installed version 0.7.2):

document = opendocx('/Users/umityalcin/Desktop/Test.docx')
words = document.xpath('//w:r', namespaces=document.nsmap)
for word in words:
    if word in Dictionary.keys():
        print "found it", Dictionary[word]
        document = replace(document, word, Dictionary[word])
savedocx(document, coreprops, appprops, contenttypes, websettings,
    wordrelationships, output, imagefiledict=None) 

运行此操作会产生以下错误消息:

Running this produces the following error message:

NameError:未定义名称'coreprops'

NameError: name 'coreprops' is not defined

也许我正在尝试做一些无法完成的事情,但是如果我错过了一些简单的事情,我将不胜感激.

Maybe I am trying to do something that cannot be done—but I would appreciate your help if I am missing something simple.

如果这很重要,我正在OSX 10.9.3上使用64位版本的Enthought的机盖

If this matters, I am using the 64 bit version of Enthought's Canopy on OSX 10.9.3

推荐答案

当前版本的python-docx没有search()函数或replace()函数.这些请求被相当频繁地请求,但是一般情况下的实现非常棘手,并且尚未达到积压的顶部.

The current version of python-docx does not have a search() function or a replace() function. These are requested fairly frequently, but an implementation for the general case is quite tricky and it hasn't risen to the top of the backlog yet.

一些人已经取得了成功,使用已经存在的设施完成了他们需要的工作.这是一个例子.顺便说一句,它与各节无关:)

Several folks have had success though, getting done what they need, using the facilities already present. Here's an example. It has nothing to do with sections by the way :)

for paragraph in document.paragraphs:
    if 'sea' in paragraph.text:
        print paragraph.text
        paragraph.text = 'new text containing ocean'

要同时在表格中进行搜索,您需要使用类似以下内容的内容:

To search in Tables as well, you would need to use something like:

for table in document.tables:
    for cell in table.cells:
        for paragraph in cell.paragraphs:
            if 'sea' in paragraph.text:
               ...

如果您走这条路,您可能很快就会发现复杂性是什么.如果替换段落的整个文本,则会删除所有字符级格式,例如粗体或斜体字或短语.

If you pursue this path, you'll probably discover pretty quickly what the complexities are. If you replace the entire text of a paragraph, that will remove any character-level formatting, like a word or phrase in bold or italic.

顺便说一句,@ wnnmaw的答案中的代码是针对python-docx的旧版本的,根本不适用于0.3.0之后的版本.

By the way, the code from @wnnmaw's answer is for the legacy version of python-docx and won't work at all with versions after 0.3.0.

这篇关于如何使用python-docx替换Word文档中的文本并保存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆