docx 中的文本替换并使用 python-docx 保存更改的文件 [英] Text-Replace in docx and save the changed file with python-docx

查看:44
本文介绍了docx 中的文本替换并使用 python-docx 保存更改的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 python-docx 模块 来替换文件中的单词并保存新文件,但要注意新文件的格式必须与旧文件完全相同,但替换了单词.我该怎么做?

I'm trying to use the python-docx module to replace a word in a file and save the new file with the caveat that the new file must have exactly the same formatting as the old file, but with the word replaced. How am I supposed to do this?

docx 模块有一个 savedocx,它接受 7 个输入:

The docx module has a savedocx that takes 7 inputs:

  • 文档
  • 核心道具
  • 应用程序
  • 内容类型
  • 网络设置
  • 文字关系
  • 输出

如何使原始文件中的所有内容都保持不变,除了被替换的单词?

How do I keep everything in my original file the same except for the replaced word?

推荐答案

看起来,Python 的 Docx 并不打算存储包含图像、标题等的完整 Docx,而只包含内部内容文件.所以没有简单的方法可以做到这一点.

As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.

但是,您可以这样做:

首先,查看 docx 标签维基:

它解释了如何解压 docx 文件:典型文件如下所示:

It explains how the docx file can be unzipped: Here's how a typical file looks like:

+--docProps
|  +  app.xml
|    core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |    image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |    theme1.xml
|  +  webSettings.xml
|  --_rels
|       document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
--_rels
     .rels

Docx 只获取文档的一部分,在方法opendocx

Docx only gets one part of the document, in the method opendocx

def opendocx(file):
    '''Open a docx file, return a document XML tree'''
    mydoc = zipfile.ZipFile(file)
    xmlcontent = mydoc.read('word/document.xml')
    document = etree.fromstring(xmlcontent)
    return document

它只获取 document.xml 文件.

It only gets the document.xml file.

我建议你做的是:

  1. 使用 **opendocx* 获取文档内容
  2. advReplace方法替换document.xml
  3. 以 zip 格式打开 docx,并将 document.xml 内容替换为新的 xml 内容.
  4. 关闭并输出压缩文件(将其重命名为 output.docx)
  1. get the content of the document with **opendocx*
  2. Replace the document.xml with the advReplace method
  3. Open the docx as a zip, and replace the document.xml content's by the new xml content.
  4. Close and output the zipped file (renaming it to output.docx)

如果您安装了 node.js,请注意我已经在 DocxGenJS 上工作,这是模板docx 文档引擎,该库正在积极开发中,将很快作为节点模块发布.

If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.

这篇关于docx 中的文本替换并使用 python-docx 保存更改的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆