在docx中进行文本替换,并使用python-docx保存更改后的文件 [英] Text-Replace in docx and save the changed file with python-docx

查看:1080
本文介绍了在docx中进行文本替换,并使用python-docx保存更改后的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 python-docx模块来替换文件中的单词并保存新文件,并告诫新文件的格式必须与旧文件的格式完全相同,但要替换单词.我应该怎么做?

I'm trying to use the python-docx module to replace a word in a file and save the new file with the caveat that the new file must have exactly the same formatting as the old file, but with the word replaced. How am I supposed to do this?

docx模块具有一个saveocx,它需要7个输入:

The docx module has a savedocx that takes 7 inputs:

  • 文档
  • coreprops
  • appprops
  • contenttypes
  • 网站设置
  • 单词关系
  • 输出

如何将原始文件中的所有内容保持不变,除了被替换的单词?

How do I keep everything in my original file the same except for the replaced word?

推荐答案

貌似,Docx for Python并不意味着存储带有图像,标题,...的完整Docx,而仅包含的内部内容该文件.因此,没有简单的方法可以做到这一点.

As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.

Howewer,这是您的操作方法:

Howewer, here is how you could do it:

首先,看看 docx标签Wiki :

它说明了如何解压缩docx文件:这是典型文件的外观:

It explains how the docx file can be unzipped: Here's how a typical file looks like:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

Docx仅通过 opendocx

def opendocx(file):
    '''Open a docx file, return a document XML tree'''
    mydoc = zipfile.ZipFile(file)
    xmlcontent = mydoc.read('word/document.xml')
    document = etree.fromstring(xmlcontent)
    return document

它仅获取document.xml文件.

It only gets the document.xml file.

我建议您做的是:

  1. 使用** opendocx *
  2. 获取文档的内容
  3. 使用 advReplace 方法
  4. 替换document.xml
  5. 以zip格式打开docx,然后用新的xml内容替换document.xml内容.
  6. 关闭并输出压缩文件(将其重命名为output.docx)
  1. get the content of the document with **opendocx*
  2. Replace the document.xml with the advReplace method
  3. Open the docx as a zip, and replace the document.xml content's by the new xml content.
  4. Close and output the zipped file (renaming it to output.docx)

如果您已安装node.js,则被告知我已经使用了 DocxGenJS 该库是docx文档的引擎,正在积极开发中,并将作为节点模块尽快发布.

If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.

这篇关于在docx中进行文本替换,并使用python-docx保存更改后的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆