docx 中的文本替换并使用 python-docx 保存更改的文件 [英] Text-Replace in docx and save the changed file with python-docx
问题描述
我正在尝试使用 python-docx 模块 来替换文件中的单词并保存新文件,但要注意新文件的格式必须与旧文件完全相同,但替换了单词.我该怎么做?
I'm trying to use the python-docx module to replace a word in a file and save the new file with the caveat that the new file must have exactly the same formatting as the old file, but with the word replaced. How am I supposed to do this?
docx 模块有一个 savedocx,它接受 7 个输入:
The docx module has a savedocx that takes 7 inputs:
- 文档
- 核心道具
- 应用程序
- 内容类型
- 网络设置
- 文字关系
- 输出
如何使原始文件中的所有内容都保持不变,除了被替换的单词?
How do I keep everything in my original file the same except for the replaced word?
推荐答案
看起来,Python 的 Docx 并不打算存储包含图像、标题等的完整 Docx,而只包含内部内容文件.所以没有简单的方法可以做到这一点.
As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.
但是,您可以这样做:
首先,查看 docx 标签维基:
它解释了如何解压 docx 文件:典型文件如下所示:
It explains how the docx file can be unzipped: Here's how a typical file looks like:
+--docProps
| + app.xml
| core.xml
+ res.log
+--word //this folder contains most of the files that control the content of the document
| + document.xml //Is the actual content of the document
| + endnotes.xml
| + fontTable.xml
| + footer1.xml //Containst the elements in the footer of the document
| + footnotes.xml
| +--media //This folder contains all images embedded in the word
| | image1.jpeg
| + settings.xml
| + styles.xml
| + stylesWithEffects.xml
| +--theme
| | theme1.xml
| + webSettings.xml
| --_rels
| document.xml.rels //this document tells word where the images are situated
+ [Content_Types].xml
--_rels
.rels
Docx 只获取文档的一部分,在方法opendocx
Docx only gets one part of the document, in the method opendocx
def opendocx(file):
'''Open a docx file, return a document XML tree'''
mydoc = zipfile.ZipFile(file)
xmlcontent = mydoc.read('word/document.xml')
document = etree.fromstring(xmlcontent)
return document
它只获取 document.xml 文件.
It only gets the document.xml file.
我建议你做的是:
- 使用 **opendocx* 获取文档内容
- 用advReplace方法替换document.xml
- 以 zip 格式打开 docx,并将 document.xml 内容替换为新的 xml 内容.
- 关闭并输出压缩文件(将其重命名为 output.docx)
- get the content of the document with **opendocx*
- Replace the document.xml with the advReplace method
- Open the docx as a zip, and replace the document.xml content's by the new xml content.
- Close and output the zipped file (renaming it to output.docx)
如果您安装了 node.js,请注意我已经在 DocxGenJS 上工作,这是模板docx 文档引擎,该库正在积极开发中,将很快作为节点模块发布.
If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.
这篇关于docx 中的文本替换并使用 python-docx 保存更改的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!