在docx中进行文本替换,并使用python-docx保存更改后的文件 [英] Text-Replace in docx and save the changed file with python-docx
问题描述
我正在尝试使用 python-docx模块来替换文件中的单词并保存新文件,并告诫新文件的格式必须与旧文件的格式完全相同,但要替换单词.我应该怎么做?
I'm trying to use the python-docx module to replace a word in a file and save the new file with the caveat that the new file must have exactly the same formatting as the old file, but with the word replaced. How am I supposed to do this?
docx模块具有一个saveocx,它需要7个输入:
The docx module has a savedocx that takes 7 inputs:
- 文档
- coreprops
- appprops
- contenttypes
- 网站设置
- 单词关系
- 输出
如何将原始文件中的所有内容保持不变,除了被替换的单词?
How do I keep everything in my original file the same except for the replaced word?
推荐答案
貌似,Docx for Python并不意味着存储带有图像,标题,...的完整Docx,而仅包含的内部内容该文件.因此,没有简单的方法可以做到这一点.
As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.
Howewer,这是您的操作方法:
Howewer, here is how you could do it:
首先,看看 docx标签Wiki :
它说明了如何解压缩docx文件:这是典型文件的外观:
It explains how the docx file can be unzipped: Here's how a typical file looks like:
+--docProps
| + app.xml
| \ core.xml
+ res.log
+--word //this folder contains most of the files that control the content of the document
| + document.xml //Is the actual content of the document
| + endnotes.xml
| + fontTable.xml
| + footer1.xml //Containst the elements in the footer of the document
| + footnotes.xml
| +--media //This folder contains all images embedded in the word
| | \ image1.jpeg
| + settings.xml
| + styles.xml
| + stylesWithEffects.xml
| +--theme
| | \ theme1.xml
| + webSettings.xml
| \--_rels
| \ document.xml.rels //this document tells word where the images are situated
+ [Content_Types].xml
\--_rels
\ .rels
Docx仅通过 opendocx
def opendocx(file):
'''Open a docx file, return a document XML tree'''
mydoc = zipfile.ZipFile(file)
xmlcontent = mydoc.read('word/document.xml')
document = etree.fromstring(xmlcontent)
return document
它仅获取document.xml文件.
It only gets the document.xml file.
我建议您做的是:
- 使用** opendocx * 获取文档的内容
- 使用 advReplace 方法 替换document.xml
- 以zip格式打开docx,然后用新的xml内容替换document.xml内容.
- 关闭并输出压缩文件(将其重命名为output.docx)
- get the content of the document with **opendocx*
- Replace the document.xml with the advReplace method
- Open the docx as a zip, and replace the document.xml content's by the new xml content.
- Close and output the zipped file (renaming it to output.docx)
如果您已安装node.js,则被告知我已经使用了 DocxGenJS 该库是docx文档的引擎,正在积极开发中,并将作为节点模块尽快发布.
If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.
这篇关于在docx中进行文本替换,并使用python-docx保存更改后的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!